Extracts text, tables, and images from PDF documents accurately and reliably using advanced OCR technology.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add davepoon-pdf npx -- -y @trustedskills/davepoon-pdf
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"davepoon-pdf": {
"command": "npx",
"args": [
"-y",
"@trustedskills/davepoon-pdf"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
This skill enables an AI agent to process PDF documents using Python libraries. Specifically, it allows for reading PDFs, extracting text and tables from them, merging multiple PDFs into a single document, splitting a PDF into individual pages, rotating pages within a PDF, and extracting metadata like title, author, subject, and creator. It leverages pypdf for basic operations and pdfplumber for more advanced text and table extraction with layout preservation.
When to use it
- You need to extract the text content from a PDF document.
- You want to combine multiple PDF files into one larger file.
- You require individual pages from a PDF as separate files.
- You need to retrieve metadata (title, author, etc.) associated with a PDF.
- You are working with PDFs containing tables and need to extract that data.
Key capabilities
- PDF Reading: Opens and reads PDF documents using
pypdf. - Text Extraction: Extracts text content from PDF pages.
- Table Extraction: Extracts tabular data, preserving layout, using
pdfplumber. - PDF Merging: Combines multiple PDF files into a single output file.
- PDF Splitting: Separates a PDF document into individual page files.
- Metadata Extraction: Retrieves metadata such as title, author, and subject from PDFs.
- Page Rotation: Rotates pages within a PDF document.
Example prompts
- "Extract the text from this PDF: [PDF file]"
- "Merge these PDF documents into one file: doc1.pdf, doc2.pdf."
- "Split 'input.pdf' into individual page files."
- "What is the title and author of this PDF document?"
Tips & gotchas
- This skill relies on Python libraries (
pypdfandpdfplumber). Ensure these dependencies are available in the environment where the agent is running. - The accuracy of text and table extraction depends heavily on the quality and structure of the original PDF. Complex layouts or scanned documents may yield less accurate results.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.