Pdf

🌐Community
by davepoon · vlatest · Repository

Extracts text, tables, and images from PDF documents accurately and reliably using advanced OCR technology.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add davepoon-pdf npx -- -y @trustedskills/davepoon-pdf
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "davepoon-pdf": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/davepoon-pdf"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill enables an AI agent to process PDF documents using Python libraries. Specifically, it allows for reading PDFs, extracting text and tables from them, merging multiple PDFs into a single document, splitting a PDF into individual pages, rotating pages within a PDF, and extracting metadata like title, author, subject, and creator. It leverages pypdf for basic operations and pdfplumber for more advanced text and table extraction with layout preservation.

When to use it

  • You need to extract the text content from a PDF document.
  • You want to combine multiple PDF files into one larger file.
  • You require individual pages from a PDF as separate files.
  • You need to retrieve metadata (title, author, etc.) associated with a PDF.
  • You are working with PDFs containing tables and need to extract that data.

Key capabilities

  • PDF Reading: Opens and reads PDF documents using pypdf.
  • Text Extraction: Extracts text content from PDF pages.
  • Table Extraction: Extracts tabular data, preserving layout, using pdfplumber.
  • PDF Merging: Combines multiple PDF files into a single output file.
  • PDF Splitting: Separates a PDF document into individual page files.
  • Metadata Extraction: Retrieves metadata such as title, author, and subject from PDFs.
  • Page Rotation: Rotates pages within a PDF document.

Example prompts

  • "Extract the text from this PDF: [PDF file]"
  • "Merge these PDF documents into one file: doc1.pdf, doc2.pdf."
  • "Split 'input.pdf' into individual page files."
  • "What is the title and author of this PDF document?"

Tips & gotchas

  • This skill relies on Python libraries (pypdf and pdfplumber). Ensure these dependencies are available in the environment where the agent is running.
  • The accuracy of text and table extraction depends heavily on the quality and structure of the original PDF. Complex layouts or scanned documents may yield less accurate results.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
davepoon
Installs
36

🌐 Community

Passed automated security scans.