Something we may want to do someday in Flock, if we want to use the right tokenizer to index history…
How to detect which language a text is written in? Or when science meets human!
Blogged with Flock
Something we may want to do someday in Flock, if we want to use the right tokenizer to index history…
How to detect which language a text is written in? Or when science meets human!
Blogged with Flock
Algorithme plus vieux, utilisé par Spamassassin, et a priori utilisé aussi par Maciej Ceglowski du temps où il travaillait sur le NITLE blog census (http://www.hirank.com/semantic-indexing-project/census/lang.html)
http://odur.let.rug.nl/~vannoord/TextCat/