Word Analysis

Word frequency analysis of comments

Word Statistics

Total Comments
223726
Average Length
1000 chars
Median Length
939 chars
Shortest Comment
1 chars
Longest Comment
6,628 chars

Word Frequency Analysis

Analysis of words appearing in comments under 200 characters (excluding common stop words)

Loading chart...
Loading chart...

Word Frequency Data

Word Stem Frequency Percentage Original Forms
the1632808.01%
and1125505.52%
our677893.32%
roadless574882.82%
for413192.03%
not384131.88%
this372581.83%
rule308601.51%
that294351.44%
are248751.22%
please243821.20%
public230261.13%
keep190060.93%
these189590.93%
lands176270.86%

Custom Word Search

Note: This search operates on the entire comment corpus (all comment lengths), unlike the frequency analysis above which examines comments under 200 characters.

Key Insights from Word Analysis

Most Common Word: 'the' appears 163280 times

Coverage: Top 5 words represent 21.7% of word usage

Total Words Analyzed: 016328011255067789574884131938413372583086029435248752438223026190061895917627 words in short comments

Unique Words: 15 different meaningful words found

Analysis Methodology

Word Analysis Process:

  • • Short comments analysis: Only comments under 200 characters are analyzed
  • • All comments analysis: Includes all comments regardless of length
  • Professional stemming: Uses PostgreSQL's `ts_lexize('english_stem')` for linguistic accuracy
  • Original forms tracking: Shows which words were combined (e.g., "forest, forests")
  • • Common stop words are filtered out (the, and, this, etc.)
  • • Domain-specific words like 'roadless', 'rule' are excluded
  • • Punctuation is removed and words are normalized to lowercase
  • • Only words longer than 2 characters are included
  • • Short comments: Words must appear at least 20 times
  • • All comments: Words must appear at least 50 times