К навигации
К содержимому

Results for data analysis

Outliers detection by voting method during hierarchical data clustering

A.A. Rybakov Joint Supercomputer Center of the Russian Academy of Sciences – branch of Federal State Institution «Scientific Research Institute for System Analysis of the Russian Academy of Sciences», Moscow, Russian Federation;
S.S. Shumilin Joint Supercomputer Center of the Russian Academy of Sciences – branch of Federal State Institution «Scientific Research Institute for System Analysis of the Russian Academy of Sciences», Moscow, Russian Federation;

The article was published in issue №3

At present, we often face the task of extracting useful information from a large amount of raw data. This process, called Data Mining, combines various approaches to the analysis and processing of data, but it always begins with one specific step – data cleansing. The raw data entering the analysis are often incomplete, weakly structured, they contain duplicate information and anomalies. The presence of anomalies in the array of input data can lead to incorrect interpretation of the extracted information, errors in prediction and greatly reduce the value of the knowledge obtained. Therefore, the development of new approaches to the elimination of anomalies, or outliers, is an actual task. This article discusses an approach to detecting outliers, based on hierarchical data clustering and using a voting method to identify the most likely candidates for the role of outliers.