Count Dataset Tokens

🌐Community
by letta-ai · vlatest · Repository

This skill quickly counts the number of tokens in a dataset, useful for understanding data size and resource needs.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add count-dataset-tokens npx -- -y @trustedskills/count-dataset-tokens
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "count-dataset-tokens": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/count-dataset-tokens"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

The count-dataset-tokens skill enables AI agents to calculate the total token count within a specified dataset. This capability allows agents to estimate context window usage and manage input sizes effectively before processing large volumes of text data.

When to use it

  • Context Window Management: Determine if a specific document or file fits within the model's maximum context limits.
  • Cost Estimation: Calculate potential API costs associated with processing a batch of documents based on token volume.
  • Data Sampling: Identify representative subsets of data by analyzing total token distribution across a dataset.
  • Prompt Engineering: Verify that combined system instructions and user inputs do not exceed operational thresholds.

Key capabilities

  • Accepts file paths or raw text strings as input targets.
  • Returns precise integer values representing the total token count.
  • Supports various text encodings commonly used in LLM pipelines.
  • Integrates seamlessly with Letta AI agent workflows for automated data validation.

Example prompts

"Count the tokens in this customer support log file: [path/to/logs.json]" "Estimate how many tokens are required to process a 50-page PDF report." "Check if the combined token count of my uploaded documents exceeds 128k."

Tips & gotchas

Token counting algorithms may vary slightly depending on the underlying tokenizer model; ensure consistency when comparing counts across different runs. Always verify the specific encoding settings used by your agent configuration to match the counting logic accurately.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
letta-ai
Installs
28

🌐 Community

Passed automated security scans.