How to detect which language a text is written in?
Something we may want to do someday in Flock, if we want to use the right tokenizer to index history…
How to detect which language a text is written in? Or when science meets human!
Blogged with Flock
Categories: flock

Algorithme plus vieux, utilisé par Spamassassin, et a priori utilisé aussi par Maciej Ceglowski du temps où il travaillait sur le NITLE blog census (http://www.hirank.com/semantic-indexing-project/census/lang.html)
http://odur.let.rug.nl/~vannoord/TextCat/