Count Dataset Tokens

Name: Count Dataset Tokens
Author: letta-ai

🌐Community

by letta-ai · vlatest · Repository

This skill quickly counts the number of tokens in a dataset, useful for understanding data size and resource needs.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add count-dataset-tokens npx -- -y @trustedskills/count-dataset-tokens

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "count-dataset-tokens": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/count-dataset-tokens"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

The count-dataset-tokens skill enables AI agents to calculate the total token count within a specified dataset. This capability allows agents to estimate context window usage and manage input sizes effectively before processing large volumes of text data.

When to use it

Context Window Management: Determine if a specific document or file fits within the model's maximum context limits.
Cost Estimation: Calculate potential API costs associated with processing a batch of documents based on token volume.
Data Sampling: Identify representative subsets of data by analyzing total token distribution across a dataset.
Prompt Engineering: Verify that combined system instructions and user inputs do not exceed operational thresholds.

Key capabilities

Accepts file paths or raw text strings as input targets.
Returns precise integer values representing the total token count.
Supports various text encodings commonly used in LLM pipelines.
Integrates seamlessly with Letta AI agent workflows for automated data validation.

Example prompts

"Count the tokens in this customer support log file: [path/to/logs.json]" "Estimate how many tokens are required to process a 50-page PDF report." "Check if the combined token count of my uploaded documents exceeds 128k."

Tips & gotchas

Token counting algorithms may vary slightly depending on the underlying tokenizer model; ensure consistency when comparing counts across different runs. Always verify the specific encoding settings used by your agent configuration to match the counting logic accurately.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: letta-ai
Installs: 28

Repository (canonical source) →

🌐 Community

Passed automated security scans.