The paper demonstrated how it was possible to produce a maintainable and upgradable linguistic data
infrastructure for serious language modeling in a minority language used by less than 0.1% of the world population.
Instead of resorting to World Wide Web crawling for purposes of corpus creation, we relied on an existing language
service, the Croatian online spellchecker Hascheck, for big data collecting, which proved to be an economical and
reliable method for obtaining a large-scale lexical n-gram system. Benefits of this approach are proven by rapid
breakthroughs successfully performed in several NLP application areas.