Feature Engineering

Converting Data into Relevant Formats

What is Feature Engineering?

Feature engineering is the process of using domain knowledge to extract features from raw data through data mining techniques. Feature engineering is more an art than a science in that it uses domain knowledge of the data to create features that increase the predictive power of machine learning algorithms. Data Engineering will allow you to represent the underlying structure of the data accurately and therefore create the best model. Features can be engineered by decomposing or splitting features, from external data sources, or aggregating or combining features to create new features. Automated Feature Engineering is a process of preparing a dataset for machine learning by changing features or deriving new features to improve machine learning model performance. It is one of the fundamental blocks of Tibil’s iDEA framework where we transform the customer’s data through our in-depth knowledge of our customer’s domain.

Common Challenges in Feature Engineering

In several engagements we found that the customer data sets primarily contained messy, unstructured data which was then processed, cleaned and stored. This data can contain missing values, outliers or redundant data. With data being collected from a wide variety of sources, data redundancy becomes an unavoidable predicament.

While working on data, some of the common challenges we encounter are:

  • Analysing huge volumes of data from disparate sources
  • Availability of data for analytics in recognizable patterns for tools or statistical models
  • Overlaying business context and process to the patterns to ensure relevant analysis to deliver business insights
  • Visualizing the outcome of data analytics in appealing and relevant dashboards for quick decision making
  • Improving the time-to-value of the data analytics process

Tibil’s Feature Engineering Solution

We have helped customers solve this problem through feature selection: after categorizing or selecting the features, those features which are highly correlated to another are removed, eliminating the data redundancy. We have used the feature generation process to create new features from one or multiple existing features, for use in statistical analysis. The feature generation process adds new information to be accessible during the model construction to result in a more accurate model. Over the years, Tibil has gained significant domain knowledge, primarily in the verticals of BFSI, Retail, Healthcare, Education, Energy/Utilities and Manufacturing.

We have leveraged this domain expertise to create a feature engineering process that involves:

  • Brainstorming or testing data feature
  • Deciding what data features to create and creating them
  • Checking how the data features work with the data analytics model
  • Improving the data features further if needed
  • Go back to brainstorming/creating more data features

In the banking/financial technology sector, data from various sources such as transaction data, client data and credit bureau data are aggregated and on aligning them with the business objective, feature engineering is performed to generate most predictive features.

Our experience in feature engineering has shown that any Machine Learning model is much more efficient and accurate when we select right and more unique data features. In machine learning, we know that the model is only as good as the data we train it on. With many of our customers, we have focused their effort on creating a dataset that is optimized to maximize the information density of the data through Feature engineering with amazing results.

Important part of text classification is feature engineering i.e., the process of creating features for a machine learning model from raw text data. There are different methods to analyze the text and extract features that can be used to build a classification model. Many deep learning neural networks contain hard-coded data processing, feature extraction, and feature engineering. This usually requires some level of feature engineering work than other machine learning algorithms.

Example of Feature Engineering

For an Energy/Utilities customer in North America, TIBIL created a comprehensive, cloud-based data management system for collecting, normalizing and storing data; used proven statistical models for data analysis; and developed a sophisticated reporting system for geo-mapping of all sensors. The result of our data analytics were valuable insights on key performance parameters like the Percentage of sensors violating industry standards, Number of sensors with high frequency of outages and brownouts and Sensors with a high standard deviation.

Benefits of Automated Feature Engineering

We believe that by leveraging our Feature Engineering solution, our customers can benefit with:

  • Rapid time-to-value: Derived variables rather than data that is fed into analytics engine thereby speeding up the data analytics process
  • Fit-for-purpose analytics: Ensuring the availability of formatted data based on business context or process so that the analytics process drives business value

We have helped customers solve this problem through feature selection: after categorizing or selecting the features, those features which are highly correlated to another are removed, eliminating the data redundancy.

Besides Feature Engineering, we have expertise in Data Operations solution to audit data for veracity, completeness and reliability, Data Engineering solution to help you prepare and normalize data and build data lakes, Data Analytics solution to analyse data using statistical models to generate dashboards and reports, Data Science solution to build advanced analytical solutions based on ML algorithms and AI models, and Data Maturity Assessment to help you baseline your current data posture.

Get in touch with our Feature Engineering experts to know more