Document Processing

Name: Document Processing
Author: dirnbauer

🌐Community

by dirnbauer · vlatest · Repository

Extracts key data like dates, names, and values from Dirnbauer-formatted documents for automated workflows.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add dirnbauer-document-processing npx -- -y @trustedskills/dirnbauer-document-processing

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "dirnbauer-document-processing": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/dirnbauer-document-processing"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill enables AI agents to create, edit, and analyze common office document formats including PDFs, Word documents (.docx), PowerPoint presentations (.pptx), and Excel spreadsheets (.xlsx). It leverages various Python libraries like pdfplumber, pypdf, pandas, and openpyxl to perform tasks such as text extraction, table extraction, merging/splitting PDFs, filling forms, data analysis within spreadsheets, and OCR (Optical Character Recognition) on scanned documents. The skill allows for manipulation of document content and structure.

When to use it

Automating the extraction of data from invoices or reports in PDF format.
Creating new PowerPoint presentations based on textual instructions.
Analyzing data stored within Excel spreadsheets, such as calculating totals or identifying trends.
Merging multiple PDF documents into a single file for easier distribution.
Extracting text from scanned PDFs using OCR to make them searchable and editable.

Key capabilities

PDF Processing: Text extraction, table extraction, merging, splitting, form filling, creation, and rotation of pages.
DOCX (Word) Processing: Text extraction, creation, editing.
PPTX (PowerPoint) Processing: Text extraction, creation, editing.
XLSX (Excel) Processing: Data analysis, formula manipulation, formatting.
OCR for Scanned PDFs: Extracts text from scanned PDF documents using Optical Character Recognition.

Example prompts

"Extract all tables from this document and save them to an Excel file."
"Create a new PowerPoint presentation with the title 'Project Update' and three slides summarizing key findings."
"Merge these three PDFs into a single document named 'CombinedReport.pdf'."
"Perform data analysis on this spreadsheet, calculating the average value in column B."

Tips & gotchas

The skill relies on external Python libraries; ensure they are installed for proper functionality.
OCR accuracy depends heavily on the quality of the scanned document. Poor scans will result in inaccurate text extraction.
Editing DOCX and PPTX files involves unpacking, modifying, and repacking the XML structure, which can be complex.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: dirnbauer
Installs: 40

Repository (canonical source) →

🌐 Community

Passed automated security scans.