Reference
PDF Glossary
34 key terms for working with PDFs, optimizing for AI search engines, and understanding document privacy. Each definition is short, plain-English, and linked to a deeper resource where available.
A
- AEO
- Answer Engine Optimization — structuring content so AI answer engines (ChatGPT, Perplexity, Google AI Overviews) can extract and cite clean answers.
B
- Browser-Only Processing
- Computing operations executed entirely in the user's web browser using JavaScript and WebAssembly, so file data never leaves the device.
- Bates Numbering
- A sequential numbering system applied to legal documents (PDFs) for unique identification and reference, common in e-discovery.
C
- Compression
- Reducing PDF file size by optimizing images, removing duplicate resources, and rewriting streams more efficiently while preserving visual quality.
- Canonical URL
- The official URL of a web page, declared in HTML to prevent duplicate-content issues when the same page is reachable at multiple URLs.
- CTR
- Click-Through Rate — the percentage of users who see a search result or ad and click on it, used as a quality signal.
D
- Digital Signature
- A cryptographic signature embedded in a PDF that proves authenticity and integrity, compliant with PDF Advanced Electronic Signatures (PAdES).
- Dofollow Link
- A backlink that passes SEO authority (PageRank) to the linked page, the default behavior unless explicitly marked nofollow.
E
- E-E-A-T
- Experience, Expertise, Authoritativeness, and Trustworthiness — Google's framework for evaluating content quality, especially for YMYL topics.
- eDiscovery
- The process of identifying, collecting, and producing electronically stored information in response to a legal request, often involving large PDF datasets.
G
- GEO
- Generative Engine Optimization — optimizing content specifically for LLM crawlers to parse, understand, and reference in their generated responses.
H
- Hreflang
- An HTML attribute that signals to search engines which language and regional version of a page to show users based on their locale.
I
- IndexNow
- A real-time URL submission protocol supported by Bing and Yandex that notifies search engines of new or updated pages for fast indexing.
J
- JavaScript
- A programming language that runs in web browsers, used by JadePDF to manipulate PDF data structures without server-side processing.
L
- llms.txt
- A proposed standard file (Markdown format) at the site root that tells LLM crawlers what your site does, what pages matter, and how to extract content.
- Long-tail Keywords
- Specific multi-word search queries with lower search volume but higher intent, easier to rank for on new domains than broad head terms.
M
- Metadata
- Hidden information embedded in a PDF: author, title, creation date, software used, and XMP data that identifies the document.
N
- Nofollow Link
- A backlink with rel="nofollow" that does not pass SEO authority, commonly used for user-generated content, social media, and comments.
O
- OCR
- Optical Character Recognition — technology that extracts text from images and scanned documents, converting pixels into searchable, editable text.
P
- Portable Document Format — a file format developed by Adobe that captures document text, fonts, images, and layout for reliable viewing across platforms.
- pdf-lib
- A popular open-source JavaScript library for creating and modifying PDF documents in the browser, used by JadePDF for most client-side operations.
- PAdES
- PDF Advanced Electronic Signatures — a set of restrictions and extensions to PDF for advanced electronic signatures, recognized in the EU eIDAS regulation.
- Page Authority
- A score (0-100) predicting how well a specific page will rank in search engines, influenced by backlinks, content quality, and technical SEO.
- PDF/A
- An ISO-standardized subset of PDF specialized for long-term archival, embedding all fonts and prohibiting external dependencies.
- PDF/UA
- PDF/Universal Accessibility — a PDF standard ensuring content is accessible to people with disabilities, requiring tagged structure and metadata.
R
- Redaction
- Permanently removing sensitive text or images from a PDF so the underlying data cannot be recovered, distinct from black-box overlays.
- robots.txt
- A file at the site root that tells web crawlers which pages or directories they may or may not access.
S
- Structured Data
- Machine-readable markup (JSON-LD, Microdata, RDFa) embedded in HTML that helps search engines understand page content for rich results.
- Schema.org
- A collaborative vocabulary of structured-data schemas maintained by Google, Microsoft, Yahoo, and Yandex, used for SEO and knowledge graphs.
- Sandbox Period
- A 2-3 month period for new domains where Google intentionally limits search visibility until it builds trust through backlinks and quality signals.
- Sitemap
- An XML file listing all indexable pages on a site, helping search engines discover and crawl content efficiently.
W
- WebAssembly (Wasm)
- A binary instruction format that runs code at near-native speed in web browsers, enabling JadePDF to process PDFs locally without server uploads.
- WebRTC
- Web Real-Time Communication — a browser API for peer-to-peer audio, video, and data transfer, occasionally used for collaborative PDF editing.
X
- XMP
- Extensible Metadata Platform — an ISO standard for embedding structured metadata in PDFs, images, and other file formats.
Try the tools behind these terms
Every concept in this glossary is implemented as a free, browser-based tool. No upload, no signup, no daily limits.
Browse all 89 tools