- Python Machine Learning Cookbook(Second Edition)
- Giuseppe Ciaburro Prateek Joshi
- 323字
- 2021-06-24 15:40:33
How to do it…
Let's see how to carry out label encoding in Python:
- Create a new Python file and import the preprocessing() package:
>> from sklearn import preprocessing
- This package contains various functions that are needed for data preprocessing. To encode labels with a value between 0 and n_classes-1, the preprocessing.LabelEncoder() function can be used. Let's define the label encoder, as follows:
>> label_encoder = preprocessing.LabelEncoder()
- The label_encoder object knows how to understand word labels. Let's create some labels:
>> input_classes = ['audi', 'ford', 'audi', 'toyota', 'ford', 'bmw']
- We are now ready to encode these labels—first, the fit() function is used to fit the label encoder, and then the class mapping encoders are printed:
>> label_encoder.fit(input_classes)
>> print("Class mapping: ")
>> for i, item in enumerate(label_encoder.classes_):
... print(item, "-->", i)
- Run the code, and you will see the following output on your Terminal:
Class mapping:
audi --> 0
bmw --> 1
ford --> 2
toyota --> 3
- As shown in the preceding output, the words have been transformed into zero-indexed numbers. Now, when you encounter a set of labels, you can simply transform them, as follows:
>> labels = ['toyota', 'ford', 'audi']
>> encoded_labels = label_encoder.transform(labels)
>> print("Labels =", labels)
>> print("Encoded labels =", list(encoded_labels))
Here is the output that you'll see on your Terminal:
Labels = ['toyota', 'ford', 'audi']
Encoded labels = [3, 2, 0]
- This is way easier than manually maintaining mapping between words and numbers. You can check the correctness by transforming numbers back into word labels:
>> encoded_labels = [2, 1, 0, 3, 1]
>> decoded_labels = label_encoder.inverse_transform(encoded_labels)
>> print("Encoded labels =", encoded_labels)
>> print("Decoded labels =", list(decoded_labels))
To transform labels back to their original encoding, the inverse_transform() function has been applied. Here is the output:
Encoded labels = [2, 1, 0, 3, 1]
Decoded labels = ['ford', 'bmw', 'audi', 'toyota', 'bmw']
As you can see, the mapping is preserved perfectly.