Content Similarity Checker

🌐Community
by dkyazzentwatwa · vlatest · Repository

This tool swiftly analyzes text to determine how similar it is to another, aiding in plagiarism detection and duplicate content identification.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add content-similarity-checker npx -- -y @trustedskills/content-similarity-checker
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "content-similarity-checker": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/content-similarity-checker"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill, the Content Similarity Checker, analyzes text to determine how similar two pieces of content are. It uses various algorithms like Cosine Similarity (TF-IDF based), Jaccard Similarity (set-based comparison), and Levenshtein Distance (edit distance) to calculate a similarity score. The tool can compare individual texts or files, perform batch comparisons across multiple documents, and generate detailed reports on the results.

When to use it

  • Plagiarism Detection: Identify potential instances of plagiarism in student work or published content.
  • Duplicate Content Identification: Find duplicate articles or pages on a website for SEO optimization purposes.
  • Content Revision: Assess how much two versions of a document have changed during revisions.
  • Typo/Error Detection (for short texts): Identify minor differences and potential typos between short pieces of text using Levenshtein Distance.
  • Finding Similar Documents: Quickly identify documents within a folder that are most similar to each other or a given input text.

Key capabilities

  • Cosine Similarity: Compares documents based on TF-IDF (Term Frequency-Inverse Document Frequency).
  • Jaccard Similarity: Compares the overlap of words/tokens between two texts.
  • Levenshtein Distance: Measures the edit distance between short texts, useful for typo detection.
  • Batch Comparison: Compares multiple documents against each other.
  • Similarity Matrix Generation: Creates a matrix showing pairwise similarity scores between all documents in a set.
  • Detailed Reports: Provides comprehensive reports on comparison results.

Example prompts

  • "Compare the similarity between 'The quick brown fox...' and 'A fast brown fox...'"
  • "Check the similarity of doc1.txt and doc2.txt."
  • "Find documents in my './documents/' folder that are similar to a given text, using Jaccard Similarity."

Tips & gotchas

  • Algorithm Selection: Choose the appropriate algorithm based on the type of content being compared (Cosine for long documents, Levenshtein for short texts).
  • Normalization: The similarity scores returned range from 0.0 to 1.0, representing a percentage-like value.
  • Thresholds: When using batch comparison or folder analysis, experiment with different threshold values to fine-tune the results.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
dkyazzentwatwa
Installs
38

🌐 Community

Passed automated security scans.