Back to Blog

Consider three specific situations:

A paralegal comparing two versions of a settlement agreement. Both versions are marked confidential; the client relationship is governed by an NDA. The task is straightforward — find what changed between draft 3 and draft 4. But every popular online comparison tool sends both documents to a server.

A freelance accountant extracting line items from a scanned tax form. The document contains a client's full name, Social Security Number, and income figures. They need the numbers in a spreadsheet. The obvious tool is an OCR service. Most OCR services upload the document.

An HR professional opening a confidential employee performance review in PDF format. They need to read it, not edit it. Their company laptop is managed and they'd prefer not to open it through a cloud service that logs activity.

These are not edge cases. They are normal daily work for document professionals. The default online workflow sends the document to a server you do not control.

Here is how to process sensitive documents without uploading them.

How to Process Sensitive Documents Without Uploading Them

Modern browsers can run complex software. This is not a recent development — it's been true since the widespread adoption of JavaScript, and it became far more capable with WebAssembly, which lets compiled C, C++, and Rust code run in the browser at near-native speed.

PDF.js is a JavaScript library maintained by Mozilla that renders PDFs entirely in the browser. It's what Firefox uses for its built-in PDF viewer. Chrome uses its own renderer. Tesseract.js is a JavaScript port of Tesseract, the open-source OCR engine originally developed at HP Labs and now maintained at Google. It runs OCR entirely in the browser.

A properly built web tool that uses these libraries handles your document without any server involvement. The file loads into browser memory via the browser's FileReader API. The processing runs locally. The result is generated as a Blob — a binary object in browser memory — and made available as a download. Nothing is transmitted over the network except the initial page load (HTML, CSS, JavaScript, and WebAssembly files), which happens once and can be cached.

The implication: browser-based tools have the same architectural capability as desktop software, delivered over the web.

Use Case 1: Viewing a PDF Without Uploading

The simplest option requires no tool at all. Drag a PDF file onto an open Chrome, Firefox, Edge, or Safari tab (not onto an existing page — onto a blank tab or the browser's tab bar). The browser opens the PDF using its built-in viewer. No upload, no external service, nothing leaves your device.

This works for standard PDFs created from word processors, design tools, or any modern PDF-generating workflow. It handles text selection, in-document search (Ctrl+F or Cmd+F), zoom, and printing. For everyday sensitive documents, this covers the reading task cleanly.

Use a dedicated PDF application for interactive tax forms, XFA-format forms, documents with embedded 3D content, or signature validation that depends on a specific certificate trust chain. Those are specialized PDF workflows, not ordinary viewing tasks.

For a dedicated viewer with a cleaner interface: The PDF viewer runs PDF.js in the browser with a focused reading interface, without the browser's address bar, tab history, and bookmarks toolbar. It accepts either a URL or a local file.

The practical difference from just dragging into a tab: the file doesn't enter your browser's tab history or its "recently opened" document list. On a managed corporate laptop where browser activity is logged, or on a device that syncs tabs across personal and work profiles via Chrome Sync or Firefox Sync, this distinction matters.

Use Case 2: Comparing Two Contract Versions

Compare PDFs renders two documents side by side using PDF.js. Both files load into browser memory; neither is transmitted to any server. The comparison runs locally and produces a visual diff — differences highlighted page by page.

What it surfaces: text additions and removals, reformatted paragraphs, page insertions and deletions. For contract review, this means you can identify clause changes, modified defined terms, and structural edits without needing both parties to be in the same room with printed copies.

The practical workflow for legal document comparison:

  1. Open Compare PDFs
  2. Load the earlier version on the left and the later version on the right
  3. Step through pages using the navigation controls
  4. Note any highlighted differences for review

For the settlement agreement scenario from the opening: both documents stay on the paralegal's machine. The NDA covering the document contents applies to the document, not to a tool processing it locally — but the question of whether processing counts as "disclosure" under the agreement disappears entirely when the document never leaves the device.

The compare legal contracts variation covers this use case specifically.

A note on source documents: The visual diff works on the rendered text content of PDFs. If two contract versions were created from different source files (one from Word, one re-typeset from scratch), minor formatting differences — line breaks, hyphenation, spacing — may appear in the comparison. For cleanest results, compare two versions exported from the same source document.

Use Case 3: Extracting Text From a Scanned Document

The OCR tool uses Tesseract.js — the browser-compiled version of the Tesseract OCR engine, originally trained on data from the Google Books project. It runs entirely in the browser and extracts text from images and scanned PDFs without uploading either.

For the accountant scenario: a scanned tax form uploaded to a server-based OCR service transmits the document contents to remote infrastructure. If the document contains a client's SSN and income information, that data is now governed by another service's terms and retention policy.

Using a browser-based OCR tool, the scanned image or PDF loads into browser memory, Tesseract processes it locally, and the extracted text appears in the browser. The SSN and income figures never leave the device.

Workflow:

  1. Open the OCR tool
  2. Upload the scanned document (local file, not a URL — keep it off the network)
  3. Select the language if it's not English
  4. Run OCR
  5. Copy the extracted text and paste it into your spreadsheet or document

The OCR for legal documents variation covers extraction from contracts, filings, and court documents.

Accuracy note: Tesseract.js performs best on clean, high-resolution scans with standard typefaces. For a scanned tax form printed on standard paper and scanned at 300 DPI or higher, accuracy is typically 95-98% on printed text. Handwritten text, low-resolution faxes, and documents with heavy background patterns yield lower accuracy — scan at the highest quality your equipment supports.

When to Use a Dedicated Desktop Tool

Browser-based document processing handles the everyday tasks that create most privacy concerns: reading, comparing, and extracting text. Use a dedicated desktop PDF tool for specialized operations that modify or validate the PDF itself.

Very large files. Browser memory is shared with everything else the browser is doing. Very large PDFs above 100MB can process slowly or exhaust tab memory. Contracts, reports, and financial statements usually stay below that threshold. Large architectural drawings, book-length manuscripts, and high-resolution scanned archives belong in a dedicated desktop workflow.

Complex interactive forms. PDFs with proprietary JavaScript form fields calculate totals, validate inputs, or trigger conditional behavior. These forms are common in tax preparation, insurance applications, and some government filings. Use a dedicated PDF reader when the form logic itself is the task.

Digital signature verification. Some signed PDFs rely on a specific approved certificate trust list. Browser viewers can show that signatures exist, but formal verification belongs in a dedicated PDF application that supports the required trust chain.

Permanent redaction. Applying permanent redactions to a PDF, where the underlying text is actually removed rather than covered, requires a tool that modifies the PDF's internal structure. Use a dedicated PDF editor for that workflow.

The decision rule is simple: use browser-based local processing when you need to read, compare, or extract text without uploading. Use desktop PDF software when you need to execute form scripts, validate specialized signatures, or permanently alter the PDF structure.