Converting Data into Relevant Formats
What is Feature Engineering?
Feature engineering is the process of using domain knowledge to extract features from raw data through data mining techniques. Feature engineering is more an art than a science in that it uses domain knowledge of the data to create features that increases the predictive power of machine learning algorithms. It is one of the fundamental blocks of Tibil’s iDEA framework where we transform the customer’s data through our in-depth knowledge of our customer’s domain.
Common Challenges in Feature Engineering
In several engagements we found that the customer data sets primarily contained messy, unstructured data which was then processed, cleaned and stored. This data can contain missing values, outliers or redundant data. With data being collected from a wide variety of sources, data redundancy becomes an unavoidable predicament.
While working on data, some of the common challenges we encounter are:
- Analysing huge volumes of data from disparate sources
- Availability of data for analytics in recognizable patterns for tools or statistical models
- Overlaying business context and process to the patterns to ensure relevant analysis to deliver business insights
- Visualizing the outcome of data analytics in appealing and relevant dashboards for quick decision making
- Improving the time-to-value of the data analytics process
Example of Feature Engineering
For an Energy/Utilities customer in North America, TIBIL created a comprehensive, cloud-based data management system for collecting, normalizing and storing data; used proven statistical models for data analysis; and developed a sophisticated reporting system for geo-mapping of all sensors. The result of our data analytics were valuable insights on key performance parameters like the Percentage of sensors violating industry standards, Number of sensors with high frequency of outages and brownouts and Sensors with a high standard deviation.
Benefits of Automated Feature Engineering
We believe that by leveraging our Feature Engineering solution, our customers can benefit with:
- Rapid time-to-value: Derived variables rather than data that is fed into analytics engine thereby speeding up the data analytics process
- Fit-for-purpose analytics: Ensuring the availability of formatted data based on business context or process so that the analytics process drives business value
We have helped customers solve this problem through feature selection: after categorizing or selecting the features, those features which are highly correlated to another are removed, eliminating the data redundancy.