-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Python Machine Learning By Example
By :

In the last chapter, we briefly mentioned one-hot encoding, which transforms categorical features to numerical features in order to be used in the tree-based algorithms in scikit-learn. It will not limit our choice to tree-based algorithms if we can adopt this technique to other algorithms that only take in numerical features.
The simplest solution we can think of to transform a categorical feature with k possible values is to map it to a numerical feature with values from 1 to k. For example, [Tech, Fashion, Fashion, Sports, Tech, Tech, Sports
] becomes [1, 2, 2, 3, 1, 1, 3
]. However, this will impose an ordinal characteristic, such as Sports
is greater than Tech
, and a distance property, such as Sports
, is closer to Fashion
than to Tech
.
Instead, one-hot encoding converts the categorical feature to k binary features. Each binary feature indicates presence or not of a corresponding possible value. So the preceding example becomes...