The linguistic database in InfoCodex contains over three
million classified words/phrases linked to a universal taxonomy (ontology).
This forms the basis for the cross-lingual content recognition ("understanding
the content of a document"), and for the automatic document categorization
by the integrated neural network technology ("grouping of the documents
in well-organized bookshelves").
It is chiefly the linguistic database that enables InfoCodex to automatically
match documents according to their content and to categorize these documents
without any cumbersome training.
How is the linguistic database kept up-to-date?
InfoCodex' linguistic database is based on some 100 major
sources such as the WordNet of the Princeton University, the EuroVoc
of the EU, the AgriVoc of the United Nations. It is continuously updated
from these sources, and it is also extended with names of new celebrites
and with new brands.
Can users supply their own vocabularies and their own
thesaurus?
Yes, they can. For example, a German manufacturer of electronic
components has easily integrated a structured list of 50,000 parts in
the form of a front-end linguistic database; this list will have priority
over the standard InfoCodex linguistic database.
However, in most cases, neither special vocabularies nor
a thesaurus are needed. Adding a thesaurus with 8,000 words to InfoCodex
is not an improvement from 0 to 8,000 words, but merely an increase
from the already existing 3,000,000 words/phrases to 3,008,000, which
represents an increase of less than 0.3%.