Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering is most important step for any data scientist.
What is a Variable?
A variable is any characteristic, number, or quantity that can be measured or counted. They are called ‘variables’ because the value they take may vary, and it usually does. The following are examples of variables:
- Age (21, 35, 62, …) - Gender (male, female) - Income (GBP 20000, GBP 35000, GBP 45000, …)…
One Hot Encoding
One hot encoding, consists in encoding each categorical variable with different boolean variables (also called dummy variables) which take values 0 or 1, indicating if a category is present in an observation.
For example, for the categorical variable “Gender”, with labels ‘female’ and ‘male’, we can generate the boolean variable “female”, which takes 1 if the person is ‘female’ or 0 otherwise, or we can generate the variable “male”, which takes 1 if the person is ‘male’ and 0 otherwise.
For the categorical variable “colour” with values ‘red’, ‘blue’ and ‘green’, we can create 3 new variables…
“Artificial Neural Network” is derived from biological neural networks that develop the structure of a human brain. Like the human brain, ANN also have neurons that are interconnected to one another in various layers of the networks. These neurons are known as nodes.
As the name suggests, it accepts inputs in several different formats provided by the programmer, or you can say it the feature of data which is provided by programmer.
The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden features and patterns.
Feature scaling refers to the methods or techniques used to normalize the range of independent variables in our data, or in other words, the methods to set the feature value range within a similar scale. Feature scaling is generally the last step in the data preprocessing pipeline, performed just before training the machine learning algorithms.
Label / integers encoding: definition
• This encoding method allows for quick benchmarking of machine learning models.
- Straightforward to implement
- Does not expand the feature space
- Does not capture any information about the categories labels
- Not suitable for linear models.
Integer encoding is better suited for non-linear methods which are able to navigate through the arbitrarily assigned…
Data science enthusiastic