Word Analysis
Word frequency analysis of comments
Word Statistics
- Total Comments
- 223726
- Average Length
- 1000 chars
- Median Length
- 939 chars
- Shortest Comment
- 1 chars
- Longest Comment
- 6,628 chars
Word Frequency Analysis
Analysis of words appearing in comments under 200 characters (excluding common stop words)
Loading chart...
Loading chart...Word Frequency Data
| Word Stem | Frequency | Percentage | Original Forms |
|---|---|---|---|
| the | 163280 | 8.01% | |
| and | 112550 | 5.52% | |
| our | 67789 | 3.32% | |
| roadless | 57488 | 2.82% | |
| for | 41319 | 2.03% | |
| not | 38413 | 1.88% | |
| this | 37258 | 1.83% | |
| rule | 30860 | 1.51% | |
| that | 29435 | 1.44% | |
| are | 24875 | 1.22% | |
| please | 24382 | 1.20% | |
| public | 23026 | 1.13% | |
| keep | 19006 | 0.93% | |
| these | 18959 | 0.93% | |
| lands | 17627 | 0.86% |
Custom Word Search
Note: This search operates on the entire comment corpus (all comment lengths), unlike the frequency analysis above which examines comments under 200 characters.
Key Insights from Word Analysis
Most Common Word: 'the' appears 163280 times
Coverage: Top 5 words represent 21.7% of word usage
Total Words Analyzed: 016328011255067789574884131938413372583086029435248752438223026190061895917627 words in short comments
Unique Words: 15 different meaningful words found
Analysis Methodology
Word Analysis Process:
- • Short comments analysis: Only comments under 200 characters are analyzed
- • All comments analysis: Includes all comments regardless of length
- • Professional stemming: Uses PostgreSQL's `ts_lexize('english_stem')` for linguistic accuracy
- • Original forms tracking: Shows which words were combined (e.g., "forest, forests")
- • Common stop words are filtered out (the, and, this, etc.)
- • Domain-specific words like 'roadless', 'rule' are excluded
- • Punctuation is removed and words are normalized to lowercase
- • Only words longer than 2 characters are included
- • Short comments: Words must appear at least 20 times
- • All comments: Words must appear at least 50 times