Results for data analysis
The article was published in issue №3
At present, we often face the task of extracting useful information from a large amount of raw data. This process, called Data Mining, combines various approaches to the analysis and processing of data, but it always begins with one specific step – data cleansing. The raw data entering the analysis are often incomplete, weakly structured, they contain duplicate information and anomalies. The presence of anomalies in the array of input data can lead to incorrect interpretation of the extracted information, errors in prediction and greatly reduce the value of the knowledge obtained. Therefore, the development of new approaches to the elimination of anomalies, or outliers, is an actual task. This article discusses an approach to detecting outliers, based on hierarchical data clustering and using a voting method to identify the most likely candidates for the role of outliers.