When asked what the biggest challenges are in making data “analytics-ready,” 55% of respondents said integrating data from different platforms, followed by transforming, cleansing and formatting incoming data (39%), integrating relational and non-relational data (32%), and the sheer volume of data that needs to be managed (21 percent) at any given time. Literally hundreds of practicing dataminers and statistical modelers, most of them working at major corporations, support extensive analytics projects, have reported that they spend 80% of their effort in manipulating the data so that they can analyze it!
What type of issues companies face when dealing with customer data?
For unstructured data, the main and first issue is “data parsing” or the extraction of structured data out of the unstructured and poorly-formatted data or mixtures of data including text, images, voice, video, etc. (what is typically called “unstructured data”). In most cases, this task requires custom effort.
For the structured data, the main issues include dealing with missing data, repeating data, wrong data (practically impossible values, text instead of numbers, etc.), incorrectly-merged data (two cells inside one, etc.), and statistical outliers. Also, if the value ranges of the data differ significantly between different variables, it may introduce practical modeling and visualization problems.
Clarity SaaS from Alchemy IoT offers a cloud-based, highly-automated data pre-processing and transformation capability. Clarity identifies and reports on low-quality data, imputes missing values, finds and eliminates statistical outliers or points to data anomalies, and normalizes data to a similar value range, which dramatically improves data analysis and visualization.
To provide customer feedback, Clarity is using its proprietary patented Data Quality Index (DQI) that assesses data imperfections in real time and reports data quality as a value from 1 (100% good) to 0 (all bad).
Finally, Clarity is calculating and reporting what is called “Data Entropy”, or a measure of data randomness. When the data is highly regular, repeating, periodic, low entropy is reported. When the data is random, non-repeating, chaotic, the entropy increases. This allows the customer to gain some additional insight into its data and to understand its quality and behavior.
In order to perform the above functions, all that Clarity needs is the data. And there are several ways this data can be delivered to Clarity:
● Via IoT Cloud (which is a popular way for those using IoT sensors and devices);
● Via a direct connection over the Web to various popular data storage formats such as Historians, MongoDB, SQL DB, etc. As long as the customer’s on-premise or Cloud application supports an API, Clarity can reach that data and perform data processing;
● By a direct data transfer from the application to Clarity using https command (exact syntax will be provided).