Text Detection
Text Detection
Section titled “Text Detection”Manuscript uses statistical analysis to differentiate human-written text from AI-generated content.
Detection Signals
Section titled “Detection Signals”Sentence Length Variance
Section titled “Sentence Length Variance”Human writing naturally varies in sentence length. AI tends to produce more uniform sentences.
| Pattern | Human | AI |
|---|---|---|
| Short sentences | Common | Rare |
| Long sentences | Common | Common |
| Variation | High | Low |
Weight: 0.15
Vocabulary Richness
Section titled “Vocabulary Richness”Humans use personal vocabulary, including rare words, slang, and domain-specific terms. AI uses statistically “safe” common words.
Weight: 0.20
Contraction Usage
Section titled “Contraction Usage”Humans naturally use contractions (“don’t”, “I’m”, “we’ll”). AI often uses formal forms (“do not”, “I am”).
| Form | Human | AI |
|---|---|---|
| ”don’t” | Very common | Rare |
| ”do not” | Rare (formal) | Common |
| ”I’m” | Very common | Rare |
| ”I am” | Rare (formal) | Common |
Weight: 0.10
AI Phrase Detection
Section titled “AI Phrase Detection”Certain phrases are strong indicators of AI generation:
- “As an AI…”
- “It’s important to note…”
- “I don’t have personal experiences…”
- “Let me break this down…”
- “That’s a great question…”
- “In summary…”
- “Additionally…”
Manuscript maintains a database of 35+ such patterns.
Weight: 0.20
Hedging Language
Section titled “Hedging Language”AI often uses excessive qualifiers and hedging:
- “It’s possible that…”
- “Generally speaking…”
- “In most cases…”
- “It depends on…”
Weight: 0.10
Punctuation Variety
Section titled “Punctuation Variety”Humans use diverse punctuation (!?;:—…). AI primarily uses periods and commas.
Weight: 0.10
Repetition Patterns
Section titled “Repetition Patterns”AI tends to repeat structural patterns mechanically. Humans have organic callbacks.
Weight: 0.10
Example Analysis
Section titled “Example Analysis”Human-written text:
I've been thinking about this for days. Can't shake the feeling thatsomething's off—you know what I mean? The data just... doesn't add up.Signals detected:
- High sentence variance ✓
- Contractions used (“I’ve”, “Can’t”) ✓
- Diverse punctuation (?, —, …) ✓
- Natural hedging ✓
Verdict: Human (confidence: 0.92)
AI-generated text:
It is important to note that this analysis provides valuable insights.Additionally, the data suggests several key findings. In summary, theresults demonstrate significant patterns that warrant further investigation.Signals detected:
- Low sentence variance ✓
- No contractions ✓
- AI phrases detected (“It is important to note”, “Additionally”, “In summary”) ✓
- Formulaic structure ✓
Verdict: AI (confidence: 0.94)
API Usage
Section titled “API Usage”curl -X POST http://localhost:8080/verify \ -H "Content-Type: application/json" \ -d '{ "text": "Your text content here..." }'Detailed Response
Section titled “Detailed Response”{ "id": "hm_abc123", "verdict": "human", "confidence": 0.87, "content_type": "text", "signals": { "sentence_variance": 0.42, "vocabulary_richness": 0.78, "contraction_ratio": 0.15, "ai_phrases_detected": [], "punctuation_variety": 0.65, "burstiness": 0.38, "hedging_score": 0.22 }, "processing_time_ms": 8}Accuracy Benchmarks
Section titled “Accuracy Benchmarks”| Metric | Value |
|---|---|
| Accuracy | 90.00% |
| Precision | 100.00% |
| Recall | 80.00% |
| F1 Score | 88.89% |
Tested on 100 samples (50 human, 50 AI) including GPT-4, Claude, Gemini, and Llama-3 content.
Limitations
Section titled “Limitations”- Short texts (<100 words) have lower accuracy
- Heavily edited AI content may evade detection
- Domain-specific jargon can affect vocabulary analysis
- Non-English text support is limited
Best Practices
Section titled “Best Practices”- Minimum length: Provide at least 100 words for reliable detection
- Original content: Detection works best on unedited content
- Confidence threshold: Consider results <0.7 as uncertain
- Multiple samples: For important decisions, analyze multiple excerpts