Reference

PDF Glossary

34 key terms for working with PDFs, optimizing for AI search engines, and understanding document privacy. Each definition is short, plain-English, and linked to a deeper resource where available.

A

AEO
Answer Engine Optimization — structuring content so AI answer engines (ChatGPT, Perplexity, Google AI Overviews) can extract and cite clean answers.

B

Browser-Only Processing
Computing operations executed entirely in the user's web browser using JavaScript and WebAssembly, so file data never leaves the device.
Bates Numbering
A sequential numbering system applied to legal documents (PDFs) for unique identification and reference, common in e-discovery.

C

Compression
Reducing PDF file size by optimizing images, removing duplicate resources, and rewriting streams more efficiently while preserving visual quality.
Canonical URL
The official URL of a web page, declared in HTML to prevent duplicate-content issues when the same page is reachable at multiple URLs.
CTR
Click-Through Rate — the percentage of users who see a search result or ad and click on it, used as a quality signal.

D

Digital Signature
A cryptographic signature embedded in a PDF that proves authenticity and integrity, compliant with PDF Advanced Electronic Signatures (PAdES).
Dofollow Link
A backlink that passes SEO authority (PageRank) to the linked page, the default behavior unless explicitly marked nofollow.

E

E-E-A-T
Experience, Expertise, Authoritativeness, and Trustworthiness — Google's framework for evaluating content quality, especially for YMYL topics.
eDiscovery
The process of identifying, collecting, and producing electronically stored information in response to a legal request, often involving large PDF datasets.

G

GEO
Generative Engine Optimization — optimizing content specifically for LLM crawlers to parse, understand, and reference in their generated responses.

H

Hreflang
An HTML attribute that signals to search engines which language and regional version of a page to show users based on their locale.

I

IndexNow
A real-time URL submission protocol supported by Bing and Yandex that notifies search engines of new or updated pages for fast indexing.

J

JavaScript
A programming language that runs in web browsers, used by JadePDF to manipulate PDF data structures without server-side processing.

L

llms.txt
A proposed standard file (Markdown format) at the site root that tells LLM crawlers what your site does, what pages matter, and how to extract content.
Long-tail Keywords
Specific multi-word search queries with lower search volume but higher intent, easier to rank for on new domains than broad head terms.

M

Metadata
Hidden information embedded in a PDF: author, title, creation date, software used, and XMP data that identifies the document.

N

Nofollow Link
A backlink with rel="nofollow" that does not pass SEO authority, commonly used for user-generated content, social media, and comments.

O

OCR
Optical Character Recognition — technology that extracts text from images and scanned documents, converting pixels into searchable, editable text.

P

PDF
Portable Document Format — a file format developed by Adobe that captures document text, fonts, images, and layout for reliable viewing across platforms.
pdf-lib
A popular open-source JavaScript library for creating and modifying PDF documents in the browser, used by JadePDF for most client-side operations.
PAdES
PDF Advanced Electronic Signatures — a set of restrictions and extensions to PDF for advanced electronic signatures, recognized in the EU eIDAS regulation.
Page Authority
A score (0-100) predicting how well a specific page will rank in search engines, influenced by backlinks, content quality, and technical SEO.
PDF/A
An ISO-standardized subset of PDF specialized for long-term archival, embedding all fonts and prohibiting external dependencies.
PDF/UA
PDF/Universal Accessibility — a PDF standard ensuring content is accessible to people with disabilities, requiring tagged structure and metadata.

R

Redaction
Permanently removing sensitive text or images from a PDF so the underlying data cannot be recovered, distinct from black-box overlays.
robots.txt
A file at the site root that tells web crawlers which pages or directories they may or may not access.

S

Structured Data
Machine-readable markup (JSON-LD, Microdata, RDFa) embedded in HTML that helps search engines understand page content for rich results.
Schema.org
A collaborative vocabulary of structured-data schemas maintained by Google, Microsoft, Yahoo, and Yandex, used for SEO and knowledge graphs.
Sandbox Period
A 2-3 month period for new domains where Google intentionally limits search visibility until it builds trust through backlinks and quality signals.
Sitemap
An XML file listing all indexable pages on a site, helping search engines discover and crawl content efficiently.

W

WebAssembly (Wasm)
A binary instruction format that runs code at near-native speed in web browsers, enabling JadePDF to process PDFs locally without server uploads.
WebRTC
Web Real-Time Communication — a browser API for peer-to-peer audio, video, and data transfer, occasionally used for collaborative PDF editing.

X

XMP
Extensible Metadata Platform — an ISO standard for embedding structured metadata in PDFs, images, and other file formats.

Try the tools behind these terms

Every concept in this glossary is implemented as a free, browser-based tool. No upload, no signup, no daily limits.

Browse all 89 tools