Document Type: Research Article
Kazan Federal University, Kazan, Russian Federation
Innopolis University, Innopolis, Russian Federation
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dictionary for English. The article describes in detail the method for obtaining data. We provide correlation coefficients calculated using different methods. We pay special attention to cases of inconsistent results obtained by different methods. The statistical model behind the experimental data is discussed. The results of experiments with the Google Books Ngram corpus on the coexistence of concrete words are given. Possible applications of the dictionary are demonstrated on the example of the frequency of using the dictionary in Russian textbooks for high schools.