Народная лексикография

It is commonly assumed that information obtained online is trustworthy, especially when it comes to reference sources, such as dictionaries. In decades of translation work with Language Interface, we have found that these assumptions are far from reality. Crowd sourcing is an increasingly popular tool, often used online, to achieve specific goals. On any number of websites, users can make suggestions, make corrections, or in some cases, change an entry entirely.

Take the most commonly used English-Russian online dictionaries: Google Translator, Yandex Translator, Multitran dictionary and ABBYY Lingvo dictionary (corresponding visitors per month in millions: 820, 45, 14.7 and 3). All of these dictionaries allow users to participate in creation or correction of lexical entries. Historically, dictionaries used to be created to serve as sources of reference information. In the past they used to be developed by professional lexicographers and underwent a thorough verification prior to release. In contrast, at present, online dictionaries often encourage users to take part in dictionary development. Google Translator and Yandex Translator allow users to ‘suggest translation’. Multitran directly lets users add translations for existing headwords and enter new headwords along with translations. ABBYY Lingvo motivates users to add translations by giving such users titles ‘bronze’, ‘silver’ and ‘gold’ members. This practice leads to emergence of repetitive and incorrect translations. In addition, permission to make entries into the lexical dataset sometimes results in deliberately introduced errors and obscenities being entered.

As an example, there are numerous repetitions in Lingvo and Multitran dictionaries, for instance, in Lingvo dictionary the word "рис" is translated as “rice” 4 times. Yet, a user contributed another entry to this mess by adding an incorrect translation rise, apparently having misread the source word "rise" as "rice".

The word "squirrel" is translated as "белка" 4 times, then the same translation, added by different users, is shown again three more times. This word "белка" happen to be in trouble when translated it into English –the dictionary displays three "squirrels" back to back followed by six other user-entered "squirrels" and a wrong translation "quarrel". A user who calls herself Holy Moly added a usage example: “It is a squirrel”. What value does this example contribute to the dictionary?

There are numerous user-entered translations in online dictionaries. Often these are not even translations, but rather they are contextual synonyms. It destroys the integrity of the dictionary as a systematic source of reference data. In crowd-produced dictionaries I have discovered countless entries that are either archaic, jargon, or regional dialect. There are no satisfactory reasons for inclusion of these entries in the dictionary unless it is a specialized dictionary covering jargon, dialects etc.

The immense popularity of websites such as Yelp and TripAdvisor have proven how powerful crowd sourcing can be and have effectively forced the restaurant and travel industries to pay attention. However, I argue against the use of a tool such as crowd sourcing in professional activities in general and lexicography in particular. Like mathematical handbooks, dictionaries shall provide accurate information and if they don't their users may get in trouble. There is a strong reason why aircraft developers or doctors do not use crowd sourcing and this reason is equally applicable to dictionaries. What is good for a finding out about a stay in a luxury hotel in the Bahamas does not suffice in mission-critical areas where incorrect information may result in grave consequences.

The involvement of unlimited number of individuals the practice of lexicography leads to transformation of the dictionary from a reference source of information to an unsystematic mess of unverified data and, in some cases, into a site for declaration of lexicographic views for non-professionals. This could be the difference between your client reading a translated document correctly and agreeing to the deal on the table or misreading it and costing your company a lot of money.

This shows that the web-dictionaries subjected to the study cannot serve as a reference sources of data because they provide information of unpredictable quality and with inconsistent presentation structure.