In the dynamic world of machine learning, Python stands as the driving force behind innovation, and a pro must wield the right tools. One such tool, CatBoost, has been quietly revolutionizing the field with its exceptional speed and accuracy. In this guide, we’ll dive deep into CatBoost in Python 3, covering the fundamentals, advanced techniques, and practical examples, including a hands-on demonstration with a sample dataset and plots. By the end, you’ll be well on your way to mastering CatBoost and achieving excellence in Python machine learning.
Unveiling CatBoost
CatBoost, short for Categorical Boosting, is a gradient boosting algorithm that’s creating ripples in the machine learning community. What sets CatBoost apart is its focus on categorical features and its ability to handle them seamlessly without any pre-processing. This makes it a game-changer for those working with real-world data where categorical variables are the norm.
CatBoost is a powerful gradient boosting algorithm for machine learning that has gained popularity for its ease of use, high predictive accuracy, and robustness in handling categorical features. It is particularly well-suited for classification and regression tasks. Here’s a more detailed explanation of CatBoost:
1. Gradient Boosting Algorithm:
CatBoost, short for Categorical Boosting, is based on the concept of gradient boosting. Gradient boosting is an ensemble learning technique that builds a predictive model by combining the predictions of multiple base models, typically decision trees. It works by optimizing a loss function to minimize prediction errors.
2. Handling Categorical Features:
One of CatBoost’s standout features is its ability to handle categorical features without the need for preprocessing. Many machine learning algorithms require one-hot encoding or label encoding for categorical data, which can be cumbersome and can lead to increased dimensionality. CatBoost, however, can directly work with categorical features, making it more convenient for real-world datasets where categorical variables are common.
3. Efficient Learning:
CatBoost is designed for efficiency and speed. It includes several optimization techniques that reduce overfitting and improve model training speed. These techniques include ordered boosting, oblivious trees, and the use of matrix factorization for feature combinations.
4. Regularization:
CatBoost incorporates L2 regularization, which helps control overfitting by adding a penalty term to the loss function. This regularization contributes to the model’s robustness and generalization.
5. Built-in Cross-Validation:
CatBoost simplifies the process of hyperparameter tuning and model selection by offering built-in cross-validation. This feature makes it easier to find the best set of hyperparameters for your specific dataset.
6. Default Parameter Tuning:
CatBoost is known for its well-tuned default hyperparameters. This means that even with minimal tuning, you can often achieve competitive results. This can be a time-saver for machine learning practitioners.
7. Support for Classification and Regression:
CatBoost is versatile and can be used for both classification and regression tasks. It can predict class labels and continuous values, making it suitable for a wide range of applications.
8. Integration with Popular Libraries:
CatBoost is well-integrated with popular Python libraries for data manipulation and analysis, such as Pandas and NumPy. This makes it easy to incorporate CatBoost into your existing machine learning workflow.
9. Model Interpretability:
While CatBoost is a powerful algorithm, it also provides tools for understanding model predictions. You can examine feature importances to determine which features are most influential in making predictions.
10. Active Development and Community Support:
CatBoost is actively developed and maintained by the community. You can find extensive documentation, tutorials, and community support to help you get started and solve any issues you may encounter.
In summary, CatBoost is a powerful and efficient gradient boosting algorithm that simplifies the handling of categorical features and provides strong out-of-the-box performance. It’s an excellent choice for both beginners and experienced data scientists working on a variety of machine learning tasks, and it has found applications in fields like finance, healthcare, and e-commerce, among others. If you’re looking for a reliable and user-friendly algorithm to boost your machine learning projects, CatBoost is a solid choice.
Why Choose CatBoost?
CatBoost offers several compelling reasons to be your go-to choice for machine learning projects:
- Categorical Features Handling: CatBoost can naturally handle categorical features without the need for one-hot encoding or label encoding. This simplifies the data preparation process and often leads to better results.
- Exceptional Speed: CatBoost is engineered for efficiency. It’s faster than many other gradient boosting algorithms, which is a big advantage when dealing with large datasets.
- Model Accuracy: Thanks to its robust handling of categorical features and robust regularization techniques, CatBoost often achieves excellent predictive accuracy.
- Built-in Cross-Validation: CatBoost comes with a built-in cross-validation method that simplifies model tuning and selection.
- Great Out-of-the-Box Performance: CatBoost’s default hyperparameters are well-tuned, making it an attractive choice for quick experimentation.
Getting Started with CatBoost
Before we embark on our journey into the world of CatBoost, let’s ensure you have Python 3.x installed on your system. You can install CatBoost using pip:
pip install catboost
With CatBoost installed, let’s import the necessary libraries to kickstart our learning journey:
import numpy as np
import pandas as pd
import catboost
import matplotlib.pyplot as plt
from catboost import CatBoostClassifier, Pool
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
The Dataset
For our hands-on exploration of CatBoost, we’ll use the Iris dataset, a classic dataset in the world of machine learning. This dataset consists of features for three different species of iris flowers. Let’s load the Iris dataset and take a peek at the first few rows:
from sklearn.datasets import load_iris
iris = load_iris(as_frame=True)
df = iris.frame
print(df.head())
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
Data Exploration
Data exploration is the starting point for any machine learning project. It helps us understand the data’s characteristics. For the Iris dataset, let’s begin with basic statistics:
print(df.describe())
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333 1.000000
std 0.828066 0.435866 1.765298 0.762238 0.819232
min 4.300000 2.000000 1.000000 0.100000 0.000000
25% 5.100000 2.800000 1.600000 0.300000 0.000000
50% 5.800000 3.000000 4.350000 1.300000 1.000000
75% 6.400000 3.300000 5.100000 1.800000 2.000000
max 7.900000 4.400000 6.900000 2.500000 2.000000
Data Preprocessing
Before we can work with the data in CatBoost, we need to handle missing values, encode categorical features (if any), and split the dataset into training and testing sets. Let’s tackle these steps:
# Handle missing values if any
df.dropna(inplace=True)
# Split the data into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']
# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Building a CatBoost Model
Now that our data is preprocessed, let’s create a CatBoost model. We’ll start with a basic model configuration:
# Create a CatBoost classifier
model = CatBoostClassifier(iterations=500, depth=6, learning_rate=0.1, loss_function='MultiClass')
# Fit the model on the training data
model.fit(X_train, y_train)
0: learn: 0.9813365 total: 137ms remaining: 1m 8s
1: learn: 0.8861315 total: 139ms remaining: 34.5s
2: learn: 0.8058427 total: 140ms remaining: 23.3s
3: learn: 0.7358212 total: 142ms remaining: 17.6s
4: learn: 0.6856585 total: 144ms remaining: 14.2s
5: learn: 0.6277489 total: 146ms remaining: 12s
6: learn: 0.5872592 total: 147ms remaining: 10.4s
7: learn: 0.5521121 total: 148ms remaining: 9.1s
8: learn: 0.5150186 total: 149ms remaining: 8.12s
9: learn: 0.4854003 total: 150ms remaining: 7.33s
10: learn: 0.4546241 total: 150ms remaining: 6.68s
11: learn: 0.4274622 total: 151ms remaining: 6.15s
12: learn: 0.4034791 total: 152ms remaining: 5.7s
13: learn: 0.3769770 total: 153ms remaining: 5.29s
14: learn: 0.3576667 total: 153ms remaining: 4.96s
15: learn: 0.3386418 total: 155ms remaining: 4.7s
16: learn: 0.3226843 total: 156ms remaining: 4.44s
17: learn: 0.3104870 total: 157ms remaining: 4.22s
18: learn: 0.2956921 total: 158ms remaining: 4.01s
19: learn: 0.2829591 total: 159ms remaining: 3.82s
20: learn: 0.2699720 total: 160ms remaining: 3.65s
21: learn: 0.2590053 total: 161ms remaining: 3.5s
22: learn: 0.2488321 total: 162ms remaining: 3.35s
23: learn: 0.2395776 total: 162ms remaining: 3.22s
24: learn: 0.2302715 total: 163ms remaining: 3.1s
25: learn: 0.2234203 total: 164ms remaining: 2.98s
26: learn: 0.2144221 total: 165ms remaining: 2.88s
27: learn: 0.2079915 total: 165ms remaining: 2.79s
28: learn: 0.2005255 total: 166ms remaining: 2.69s
29: learn: 0.1937367 total: 167ms remaining: 2.61s
30: learn: 0.1860074 total: 167ms remaining: 2.53s
31: learn: 0.1806887 total: 168ms remaining: 2.46s
32: learn: 0.1746325 total: 169ms remaining: 2.39s
33: learn: 0.1703708 total: 170ms remaining: 2.33s
34: learn: 0.1648839 total: 171ms remaining: 2.27s
35: learn: 0.1607991 total: 171ms remaining: 2.21s
36: learn: 0.1576853 total: 172ms remaining: 2.15s
37: learn: 0.1521127 total: 173ms remaining: 2.1s
38: learn: 0.1481313 total: 174ms remaining: 2.05s
39: learn: 0.1443709 total: 175ms remaining: 2.01s
40: learn: 0.1407797 total: 175ms remaining: 1.96s
41: learn: 0.1372416 total: 176ms remaining: 1.92s
42: learn: 0.1335051 total: 177ms remaining: 1.88s
43: learn: 0.1299110 total: 177ms remaining: 1.84s
44: learn: 0.1269549 total: 178ms remaining: 1.8s
45: learn: 0.1240262 total: 179ms remaining: 1.76s
46: learn: 0.1210560 total: 179ms remaining: 1.73s
47: learn: 0.1184652 total: 180ms remaining: 1.7s
48: learn: 0.1164967 total: 181ms remaining: 1.66s
49: learn: 0.1141904 total: 181ms remaining: 1.63s
50: learn: 0.1118333 total: 182ms remaining: 1.6s
51: learn: 0.1099596 total: 183ms remaining: 1.57s
52: learn: 0.1079690 total: 184ms remaining: 1.55s
53: learn: 0.1055933 total: 184ms remaining: 1.52s
54: learn: 0.1040959 total: 185ms remaining: 1.5s
55: learn: 0.1012368 total: 186ms remaining: 1.47s
56: learn: 0.0984991 total: 186ms remaining: 1.45s
57: learn: 0.0967660 total: 187ms remaining: 1.42s
58: learn: 0.0947463 total: 187ms remaining: 1.4s
59: learn: 0.0934101 total: 188ms remaining: 1.38s
60: learn: 0.0917338 total: 189ms remaining: 1.36s
61: learn: 0.0899675 total: 189ms remaining: 1.34s
62: learn: 0.0873840 total: 190ms remaining: 1.32s
63: learn: 0.0854522 total: 191ms remaining: 1.3s
64: learn: 0.0841932 total: 191ms remaining: 1.28s
65: learn: 0.0830201 total: 192ms remaining: 1.26s
66: learn: 0.0813040 total: 193ms remaining: 1.24s
67: learn: 0.0797996 total: 193ms remaining: 1.23s
68: learn: 0.0784237 total: 194ms remaining: 1.21s
69: learn: 0.0767871 total: 195ms remaining: 1.2s
70: learn: 0.0753030 total: 196ms remaining: 1.18s
71: learn: 0.0741743 total: 197ms remaining: 1.17s
72: learn: 0.0729802 total: 197ms remaining: 1.15s
73: learn: 0.0718280 total: 198ms remaining: 1.14s
74: learn: 0.0705972 total: 199ms remaining: 1.13s
75: learn: 0.0694299 total: 199ms remaining: 1.11s
76: learn: 0.0684781 total: 200ms remaining: 1.1s
77: learn: 0.0672792 total: 201ms remaining: 1.08s
78: learn: 0.0659845 total: 201ms remaining: 1.07s
79: learn: 0.0650208 total: 202ms remaining: 1.06s
80: learn: 0.0637061 total: 202ms remaining: 1.05s
81: learn: 0.0627809 total: 203ms remaining: 1.03s
82: learn: 0.0617096 total: 204ms remaining: 1.02s
83: learn: 0.0610447 total: 205ms remaining: 1.01s
84: learn: 0.0602128 total: 205ms remaining: 1s
85: learn: 0.0592277 total: 206ms remaining: 991ms
86: learn: 0.0583512 total: 206ms remaining: 980ms
87: learn: 0.0573394 total: 207ms remaining: 969ms
88: learn: 0.0567103 total: 208ms remaining: 960ms
89: learn: 0.0558138 total: 209ms remaining: 951ms
90: learn: 0.0550342 total: 210ms remaining: 942ms
91: learn: 0.0543715 total: 210ms remaining: 932ms
92: learn: 0.0538101 total: 211ms remaining: 922ms
93: learn: 0.0531655 total: 211ms remaining: 913ms
94: learn: 0.0523585 total: 212ms remaining: 904ms
95: learn: 0.0515770 total: 213ms remaining: 895ms
96: learn: 0.0509992 total: 213ms remaining: 887ms
97: learn: 0.0503180 total: 214ms remaining: 879ms
98: learn: 0.0497169 total: 215ms remaining: 871ms
99: learn: 0.0489591 total: 216ms remaining: 863ms
100: learn: 0.0482433 total: 216ms remaining: 855ms
101: learn: 0.0473777 total: 217ms remaining: 848ms
102: learn: 0.0467162 total: 218ms remaining: 840ms
103: learn: 0.0461578 total: 219ms remaining: 833ms
104: learn: 0.0456718 total: 220ms remaining: 826ms
105: learn: 0.0450778 total: 220ms remaining: 819ms
106: learn: 0.0447540 total: 221ms remaining: 812ms
107: learn: 0.0441458 total: 222ms remaining: 806ms
108: learn: 0.0436147 total: 223ms remaining: 800ms
109: learn: 0.0431599 total: 224ms remaining: 793ms
110: learn: 0.0426456 total: 224ms remaining: 786ms
111: learn: 0.0421912 total: 225ms remaining: 779ms
112: learn: 0.0418751 total: 226ms remaining: 773ms
113: learn: 0.0414768 total: 226ms remaining: 767ms
114: learn: 0.0411219 total: 227ms remaining: 760ms
115: learn: 0.0405597 total: 228ms remaining: 754ms
116: learn: 0.0400836 total: 228ms remaining: 748ms
117: learn: 0.0396947 total: 229ms remaining: 742ms
118: learn: 0.0392113 total: 230ms remaining: 736ms
119: learn: 0.0388474 total: 230ms remaining: 730ms
120: learn: 0.0384401 total: 231ms remaining: 724ms
121: learn: 0.0378449 total: 232ms remaining: 718ms
122: learn: 0.0373931 total: 232ms remaining: 712ms
123: learn: 0.0369356 total: 233ms remaining: 706ms
124: learn: 0.0365907 total: 234ms remaining: 701ms
125: learn: 0.0362963 total: 234ms remaining: 696ms
126: learn: 0.0360159 total: 235ms remaining: 691ms
127: learn: 0.0355305 total: 236ms remaining: 686ms
128: learn: 0.0351991 total: 237ms remaining: 680ms
129: learn: 0.0348148 total: 237ms remaining: 675ms
130: learn: 0.0345883 total: 238ms remaining: 670ms
131: learn: 0.0342270 total: 239ms remaining: 665ms
132: learn: 0.0339722 total: 239ms remaining: 660ms
133: learn: 0.0336862 total: 240ms remaining: 655ms
134: learn: 0.0333749 total: 240ms remaining: 650ms
135: learn: 0.0331462 total: 241ms remaining: 645ms
136: learn: 0.0328141 total: 242ms remaining: 640ms
137: learn: 0.0324689 total: 242ms remaining: 635ms
138: learn: 0.0321608 total: 243ms remaining: 631ms
139: learn: 0.0319591 total: 243ms remaining: 626ms
140: learn: 0.0316437 total: 244ms remaining: 621ms
141: learn: 0.0314192 total: 245ms remaining: 617ms
142: learn: 0.0311874 total: 245ms remaining: 612ms
143: learn: 0.0310029 total: 246ms remaining: 607ms
144: learn: 0.0307442 total: 246ms remaining: 603ms
145: learn: 0.0304999 total: 247ms remaining: 599ms
146: learn: 0.0301921 total: 248ms remaining: 595ms
147: learn: 0.0299496 total: 249ms remaining: 591ms
148: learn: 0.0297355 total: 249ms remaining: 587ms
149: learn: 0.0294643 total: 250ms remaining: 583ms
150: learn: 0.0292235 total: 250ms remaining: 579ms
151: learn: 0.0289936 total: 251ms remaining: 575ms
152: learn: 0.0287927 total: 252ms remaining: 571ms
153: learn: 0.0285373 total: 252ms remaining: 567ms
154: learn: 0.0282880 total: 253ms remaining: 562ms
155: learn: 0.0280995 total: 253ms remaining: 558ms
156: learn: 0.0278874 total: 254ms remaining: 554ms
157: learn: 0.0276427 total: 254ms remaining: 551ms
158: learn: 0.0274531 total: 255ms remaining: 547ms
159: learn: 0.0271323 total: 255ms remaining: 543ms
160: learn: 0.0269017 total: 256ms remaining: 539ms
161: learn: 0.0266901 total: 257ms remaining: 535ms
162: learn: 0.0265159 total: 257ms remaining: 531ms
163: learn: 0.0263174 total: 258ms remaining: 528ms
164: learn: 0.0261641 total: 258ms remaining: 524ms
165: learn: 0.0260273 total: 259ms remaining: 520ms
166: learn: 0.0258745 total: 259ms remaining: 517ms
167: learn: 0.0256712 total: 260ms remaining: 513ms
168: learn: 0.0254837 total: 260ms remaining: 510ms
169: learn: 0.0253326 total: 261ms remaining: 506ms
170: learn: 0.0251841 total: 262ms remaining: 503ms
171: learn: 0.0249344 total: 262ms remaining: 500ms
172: learn: 0.0247762 total: 263ms remaining: 497ms
173: learn: 0.0246360 total: 264ms remaining: 494ms
174: learn: 0.0244494 total: 264ms remaining: 491ms
175: learn: 0.0243041 total: 265ms remaining: 488ms
176: learn: 0.0241254 total: 265ms remaining: 484ms
177: learn: 0.0240198 total: 266ms remaining: 481ms
178: learn: 0.0238266 total: 267ms remaining: 478ms
179: learn: 0.0236839 total: 267ms remaining: 475ms
180: learn: 0.0234980 total: 268ms remaining: 472ms
181: learn: 0.0233823 total: 268ms remaining: 469ms
182: learn: 0.0232332 total: 269ms remaining: 466ms
183: learn: 0.0230969 total: 269ms remaining: 463ms
184: learn: 0.0229568 total: 270ms remaining: 460ms
185: learn: 0.0227912 total: 270ms remaining: 456ms
186: learn: 0.0226332 total: 271ms remaining: 453ms
187: learn: 0.0224962 total: 271ms remaining: 450ms
188: learn: 0.0223438 total: 272ms remaining: 448ms
189: learn: 0.0221629 total: 273ms remaining: 445ms
190: learn: 0.0220155 total: 273ms remaining: 442ms
191: learn: 0.0218276 total: 274ms remaining: 439ms
192: learn: 0.0217185 total: 274ms remaining: 436ms
193: learn: 0.0216028 total: 275ms remaining: 434ms
194: learn: 0.0214985 total: 276ms remaining: 431ms
195: learn: 0.0213761 total: 277ms remaining: 429ms
196: learn: 0.0212865 total: 277ms remaining: 427ms
197: learn: 0.0211358 total: 278ms remaining: 424ms
198: learn: 0.0210420 total: 279ms remaining: 421ms
199: learn: 0.0209234 total: 279ms remaining: 419ms
200: learn: 0.0208260 total: 280ms remaining: 416ms
201: learn: 0.0207213 total: 280ms remaining: 413ms
202: learn: 0.0205647 total: 281ms remaining: 411ms
203: learn: 0.0204304 total: 281ms remaining: 408ms
204: learn: 0.0203341 total: 282ms remaining: 405ms
205: learn: 0.0202026 total: 282ms remaining: 403ms
206: learn: 0.0200774 total: 283ms remaining: 400ms
207: learn: 0.0199344 total: 283ms remaining: 398ms
208: learn: 0.0198113 total: 284ms remaining: 395ms
209: learn: 0.0197190 total: 284ms remaining: 393ms
210: learn: 0.0196105 total: 285ms remaining: 390ms
211: learn: 0.0194789 total: 286ms remaining: 388ms
212: learn: 0.0193919 total: 286ms remaining: 385ms
213: learn: 0.0193191 total: 287ms remaining: 383ms
214: learn: 0.0192235 total: 287ms remaining: 381ms
215: learn: 0.0191393 total: 288ms remaining: 378ms
216: learn: 0.0190541 total: 288ms remaining: 376ms
217: learn: 0.0189043 total: 289ms remaining: 374ms
218: learn: 0.0188042 total: 290ms remaining: 372ms
219: learn: 0.0186925 total: 291ms remaining: 370ms
220: learn: 0.0185912 total: 291ms remaining: 368ms
221: learn: 0.0184455 total: 292ms remaining: 365ms
222: learn: 0.0183640 total: 292ms remaining: 363ms
223: learn: 0.0182678 total: 293ms remaining: 361ms
224: learn: 0.0181968 total: 293ms remaining: 359ms
225: learn: 0.0181030 total: 294ms remaining: 356ms
226: learn: 0.0180141 total: 295ms remaining: 354ms
227: learn: 0.0179371 total: 295ms remaining: 352ms
228: learn: 0.0178583 total: 296ms remaining: 350ms
229: learn: 0.0177846 total: 296ms remaining: 348ms
230: learn: 0.0176598 total: 297ms remaining: 345ms
231: learn: 0.0175682 total: 297ms remaining: 343ms
232: learn: 0.0174190 total: 298ms remaining: 341ms
233: learn: 0.0173246 total: 298ms remaining: 339ms
234: learn: 0.0172594 total: 299ms remaining: 337ms
235: learn: 0.0171925 total: 299ms remaining: 335ms
236: learn: 0.0170918 total: 300ms remaining: 333ms
237: learn: 0.0169931 total: 300ms remaining: 331ms
238: learn: 0.0169307 total: 301ms remaining: 329ms
239: learn: 0.0168460 total: 302ms remaining: 327ms
240: learn: 0.0167919 total: 302ms remaining: 325ms
241: learn: 0.0166981 total: 303ms remaining: 323ms
242: learn: 0.0166429 total: 304ms remaining: 321ms
243: learn: 0.0165717 total: 305ms remaining: 320ms
244: learn: 0.0165008 total: 305ms remaining: 318ms
245: learn: 0.0164158 total: 306ms remaining: 316ms
246: learn: 0.0163351 total: 306ms remaining: 314ms
247: learn: 0.0162674 total: 307ms remaining: 312ms
248: learn: 0.0161968 total: 307ms remaining: 310ms
249: learn: 0.0161346 total: 308ms remaining: 308ms
250: learn: 0.0160478 total: 308ms remaining: 306ms
251: learn: 0.0159925 total: 309ms remaining: 304ms
252: learn: 0.0159286 total: 309ms remaining: 302ms
253: learn: 0.0158469 total: 310ms remaining: 300ms
254: learn: 0.0157394 total: 310ms remaining: 298ms
255: learn: 0.0156765 total: 311ms remaining: 296ms
256: learn: 0.0155879 total: 311ms remaining: 294ms
257: learn: 0.0155219 total: 312ms remaining: 293ms
258: learn: 0.0154783 total: 313ms remaining: 291ms
259: learn: 0.0154224 total: 313ms remaining: 289ms
260: learn: 0.0153621 total: 314ms remaining: 287ms
261: learn: 0.0152921 total: 314ms remaining: 285ms
262: learn: 0.0151938 total: 315ms remaining: 284ms
263: learn: 0.0150795 total: 315ms remaining: 282ms
264: learn: 0.0150278 total: 316ms remaining: 280ms
265: learn: 0.0149744 total: 317ms remaining: 279ms
266: learn: 0.0149082 total: 318ms remaining: 277ms
267: learn: 0.0148362 total: 319ms remaining: 276ms
268: learn: 0.0147671 total: 319ms remaining: 274ms
269: learn: 0.0146938 total: 320ms remaining: 272ms
270: learn: 0.0145987 total: 320ms remaining: 271ms
271: learn: 0.0145527 total: 321ms remaining: 269ms
272: learn: 0.0144639 total: 321ms remaining: 267ms
273: learn: 0.0144124 total: 322ms remaining: 266ms
274: learn: 0.0143478 total: 322ms remaining: 264ms
275: learn: 0.0142858 total: 323ms remaining: 262ms
276: learn: 0.0142283 total: 324ms remaining: 261ms
277: learn: 0.0141439 total: 324ms remaining: 259ms
278: learn: 0.0140840 total: 325ms remaining: 257ms
279: learn: 0.0140204 total: 325ms remaining: 256ms
280: learn: 0.0139567 total: 326ms remaining: 254ms
281: learn: 0.0139085 total: 326ms remaining: 252ms
282: learn: 0.0138675 total: 327ms remaining: 251ms
283: learn: 0.0138165 total: 327ms remaining: 249ms
284: learn: 0.0137702 total: 328ms remaining: 247ms
285: learn: 0.0137259 total: 328ms remaining: 246ms
286: learn: 0.0136672 total: 329ms remaining: 244ms
287: learn: 0.0136176 total: 330ms remaining: 243ms
288: learn: 0.0135746 total: 330ms remaining: 241ms
289: learn: 0.0135198 total: 331ms remaining: 240ms
290: learn: 0.0134622 total: 332ms remaining: 238ms
291: learn: 0.0134201 total: 333ms remaining: 237ms
292: learn: 0.0133689 total: 333ms remaining: 236ms
293: learn: 0.0132793 total: 334ms remaining: 234ms
294: learn: 0.0132402 total: 335ms remaining: 233ms
295: learn: 0.0131801 total: 335ms remaining: 231ms
296: learn: 0.0131457 total: 336ms remaining: 230ms
297: learn: 0.0130898 total: 336ms remaining: 228ms
298: learn: 0.0130593 total: 337ms remaining: 227ms
299: learn: 0.0130122 total: 338ms remaining: 225ms
300: learn: 0.0129658 total: 338ms remaining: 224ms
301: learn: 0.0128880 total: 339ms remaining: 222ms
302: learn: 0.0128276 total: 339ms remaining: 221ms
303: learn: 0.0127843 total: 340ms remaining: 219ms
304: learn: 0.0127458 total: 340ms remaining: 218ms
305: learn: 0.0126691 total: 341ms remaining: 216ms
306: learn: 0.0126114 total: 342ms remaining: 215ms
307: learn: 0.0125656 total: 342ms remaining: 213ms
308: learn: 0.0125126 total: 343ms remaining: 212ms
309: learn: 0.0124746 total: 343ms remaining: 210ms
310: learn: 0.0124365 total: 344ms remaining: 209ms
311: learn: 0.0124002 total: 345ms remaining: 208ms
312: learn: 0.0123667 total: 346ms remaining: 207ms
313: learn: 0.0123186 total: 346ms remaining: 205ms
314: learn: 0.0122814 total: 347ms remaining: 204ms
315: learn: 0.0122297 total: 348ms remaining: 202ms
316: learn: 0.0121733 total: 348ms remaining: 201ms
317: learn: 0.0121371 total: 349ms remaining: 200ms
318: learn: 0.0120900 total: 349ms remaining: 198ms
319: learn: 0.0120464 total: 350ms remaining: 197ms
320: learn: 0.0119967 total: 350ms remaining: 195ms
321: learn: 0.0119658 total: 351ms remaining: 194ms
322: learn: 0.0119245 total: 351ms remaining: 193ms
323: learn: 0.0118890 total: 352ms remaining: 191ms
324: learn: 0.0118487 total: 353ms remaining: 190ms
325: learn: 0.0118105 total: 353ms remaining: 188ms
326: learn: 0.0117515 total: 354ms remaining: 187ms
327: learn: 0.0116953 total: 354ms remaining: 186ms
328: learn: 0.0116502 total: 355ms remaining: 184ms
329: learn: 0.0115900 total: 355ms remaining: 183ms
330: learn: 0.0115462 total: 356ms remaining: 182ms
331: learn: 0.0114960 total: 356ms remaining: 180ms
332: learn: 0.0114644 total: 357ms remaining: 179ms
333: learn: 0.0114374 total: 358ms remaining: 178ms
334: learn: 0.0113922 total: 359ms remaining: 177ms
335: learn: 0.0113605 total: 360ms remaining: 176ms
336: learn: 0.0113282 total: 361ms remaining: 174ms
337: learn: 0.0112944 total: 361ms remaining: 173ms
338: learn: 0.0112669 total: 362ms remaining: 172ms
339: learn: 0.0112261 total: 362ms remaining: 170ms
340: learn: 0.0111866 total: 363ms remaining: 169ms
341: learn: 0.0111513 total: 363ms remaining: 168ms
342: learn: 0.0111159 total: 364ms remaining: 167ms
343: learn: 0.0110875 total: 364ms remaining: 165ms
344: learn: 0.0110485 total: 365ms remaining: 164ms
345: learn: 0.0110119 total: 365ms remaining: 163ms
346: learn: 0.0109689 total: 366ms remaining: 161ms
347: learn: 0.0109289 total: 367ms remaining: 160ms
348: learn: 0.0108718 total: 367ms remaining: 159ms
349: learn: 0.0108328 total: 368ms remaining: 158ms
350: learn: 0.0107850 total: 368ms remaining: 156ms
351: learn: 0.0107481 total: 369ms remaining: 155ms
352: learn: 0.0107189 total: 369ms remaining: 154ms
353: learn: 0.0106873 total: 370ms remaining: 153ms
354: learn: 0.0106450 total: 370ms remaining: 151ms
355: learn: 0.0106054 total: 371ms remaining: 150ms
356: learn: 0.0105746 total: 372ms remaining: 149ms
357: learn: 0.0105430 total: 373ms remaining: 148ms
358: learn: 0.0105066 total: 374ms remaining: 147ms
359: learn: 0.0104587 total: 374ms remaining: 146ms
360: learn: 0.0104265 total: 375ms remaining: 144ms
361: learn: 0.0103897 total: 375ms remaining: 143ms
362: learn: 0.0103327 total: 376ms remaining: 142ms
363: learn: 0.0102953 total: 377ms remaining: 141ms
364: learn: 0.0102651 total: 377ms remaining: 139ms
365: learn: 0.0102346 total: 378ms remaining: 138ms
366: learn: 0.0102045 total: 378ms remaining: 137ms
367: learn: 0.0101695 total: 379ms remaining: 136ms
368: learn: 0.0101292 total: 379ms remaining: 135ms
369: learn: 0.0101072 total: 380ms remaining: 134ms
370: learn: 0.0100859 total: 381ms remaining: 132ms
371: learn: 0.0100314 total: 381ms remaining: 131ms
372: learn: 0.0099937 total: 382ms remaining: 130ms
373: learn: 0.0099609 total: 382ms remaining: 129ms
374: learn: 0.0099316 total: 383ms remaining: 128ms
375: learn: 0.0099035 total: 384ms remaining: 126ms
376: learn: 0.0098627 total: 384ms remaining: 125ms
377: learn: 0.0098270 total: 385ms remaining: 124ms
378: learn: 0.0098028 total: 386ms remaining: 123ms
379: learn: 0.0097738 total: 386ms remaining: 122ms
380: learn: 0.0097466 total: 387ms remaining: 121ms
381: learn: 0.0097135 total: 387ms remaining: 120ms
382: learn: 0.0096823 total: 388ms remaining: 119ms
383: learn: 0.0096481 total: 389ms remaining: 117ms
384: learn: 0.0096248 total: 389ms remaining: 116ms
385: learn: 0.0095828 total: 390ms remaining: 115ms
386: learn: 0.0095524 total: 390ms remaining: 114ms
387: learn: 0.0095322 total: 391ms remaining: 113ms
388: learn: 0.0095085 total: 391ms remaining: 112ms
389: learn: 0.0094847 total: 392ms remaining: 111ms
390: learn: 0.0094581 total: 392ms remaining: 109ms
391: learn: 0.0094356 total: 393ms remaining: 108ms
392: learn: 0.0094075 total: 394ms remaining: 107ms
393: learn: 0.0093884 total: 394ms remaining: 106ms
394: learn: 0.0093578 total: 395ms remaining: 105ms
395: learn: 0.0093364 total: 395ms remaining: 104ms
396: learn: 0.0093134 total: 396ms remaining: 103ms
397: learn: 0.0092902 total: 396ms remaining: 102ms
398: learn: 0.0092683 total: 397ms remaining: 100ms
399: learn: 0.0092486 total: 397ms remaining: 99.4ms
400: learn: 0.0092262 total: 398ms remaining: 98.3ms
401: learn: 0.0091973 total: 399ms remaining: 97.3ms
402: learn: 0.0091612 total: 400ms remaining: 96.2ms
403: learn: 0.0091296 total: 400ms remaining: 95.1ms
404: learn: 0.0090974 total: 401ms remaining: 94ms
405: learn: 0.0090669 total: 401ms remaining: 92.9ms
406: learn: 0.0090271 total: 402ms remaining: 91.8ms
407: learn: 0.0090075 total: 402ms remaining: 90.7ms
408: learn: 0.0089826 total: 403ms remaining: 89.7ms
409: learn: 0.0089578 total: 404ms remaining: 88.6ms
410: learn: 0.0089370 total: 404ms remaining: 87.5ms
411: learn: 0.0089122 total: 405ms remaining: 86.5ms
412: learn: 0.0088833 total: 405ms remaining: 85.4ms
413: learn: 0.0088605 total: 406ms remaining: 84.3ms
414: learn: 0.0088282 total: 407ms remaining: 83.3ms
415: learn: 0.0088085 total: 407ms remaining: 82.2ms
416: learn: 0.0087789 total: 408ms remaining: 81.1ms
417: learn: 0.0087506 total: 408ms remaining: 80.1ms
418: learn: 0.0087325 total: 409ms remaining: 79ms
419: learn: 0.0086985 total: 409ms remaining: 78ms
420: learn: 0.0086767 total: 410ms remaining: 76.9ms
421: learn: 0.0086540 total: 410ms remaining: 75.8ms
422: learn: 0.0086300 total: 411ms remaining: 74.8ms
423: learn: 0.0086089 total: 412ms remaining: 73.8ms
424: learn: 0.0085840 total: 412ms remaining: 72.8ms
425: learn: 0.0085631 total: 413ms remaining: 71.7ms
426: learn: 0.0085330 total: 414ms remaining: 70.8ms
427: learn: 0.0085071 total: 415ms remaining: 69.7ms
428: learn: 0.0084802 total: 415ms remaining: 68.7ms
429: learn: 0.0084636 total: 416ms remaining: 67.7ms
430: learn: 0.0084437 total: 417ms remaining: 66.7ms
431: learn: 0.0084275 total: 417ms remaining: 65.7ms
432: learn: 0.0084027 total: 418ms remaining: 64.7ms
433: learn: 0.0083845 total: 419ms remaining: 63.7ms
434: learn: 0.0083685 total: 419ms remaining: 62.6ms
435: learn: 0.0083451 total: 420ms remaining: 61.6ms
436: learn: 0.0083168 total: 420ms remaining: 60.6ms
437: learn: 0.0082985 total: 421ms remaining: 59.6ms
438: learn: 0.0082707 total: 421ms remaining: 58.5ms
439: learn: 0.0082491 total: 422ms remaining: 57.5ms
440: learn: 0.0082219 total: 422ms remaining: 56.5ms
441: learn: 0.0081919 total: 423ms remaining: 55.5ms
442: learn: 0.0081731 total: 423ms remaining: 54.5ms
443: learn: 0.0081589 total: 424ms remaining: 53.5ms
444: learn: 0.0081403 total: 424ms remaining: 52.4ms
445: learn: 0.0081171 total: 425ms remaining: 51.5ms
446: learn: 0.0081011 total: 426ms remaining: 50.5ms
447: learn: 0.0080789 total: 427ms remaining: 49.5ms
448: learn: 0.0080554 total: 427ms remaining: 48.5ms
449: learn: 0.0080404 total: 428ms remaining: 47.5ms
450: learn: 0.0080171 total: 429ms remaining: 46.6ms
451: learn: 0.0079952 total: 429ms remaining: 45.6ms
452: learn: 0.0079670 total: 430ms remaining: 44.6ms
453: learn: 0.0079420 total: 430ms remaining: 43.6ms
454: learn: 0.0079225 total: 431ms remaining: 42.6ms
455: learn: 0.0078918 total: 432ms remaining: 41.6ms
456: learn: 0.0078744 total: 432ms remaining: 40.7ms
457: learn: 0.0078534 total: 433ms remaining: 39.7ms
458: learn: 0.0078382 total: 433ms remaining: 38.7ms
459: learn: 0.0078107 total: 434ms remaining: 37.7ms
460: learn: 0.0077943 total: 434ms remaining: 36.7ms
461: learn: 0.0077754 total: 435ms remaining: 35.8ms
462: learn: 0.0077596 total: 435ms remaining: 34.8ms
463: learn: 0.0077402 total: 436ms remaining: 33.8ms
464: learn: 0.0077274 total: 436ms remaining: 32.8ms
465: learn: 0.0077073 total: 437ms remaining: 31.9ms
466: learn: 0.0076962 total: 437ms remaining: 30.9ms
467: learn: 0.0076773 total: 438ms remaining: 29.9ms
468: learn: 0.0076627 total: 438ms remaining: 29ms
469: learn: 0.0076465 total: 439ms remaining: 28ms
470: learn: 0.0076316 total: 440ms remaining: 27.1ms
471: learn: 0.0076157 total: 441ms remaining: 26.1ms
472: learn: 0.0075998 total: 442ms remaining: 25.2ms
473: learn: 0.0075819 total: 443ms remaining: 24.3ms
474: learn: 0.0075681 total: 444ms remaining: 23.3ms
475: learn: 0.0075496 total: 444ms remaining: 22.4ms
476: learn: 0.0075307 total: 445ms remaining: 21.5ms
477: learn: 0.0075120 total: 446ms remaining: 20.5ms
478: learn: 0.0074922 total: 447ms remaining: 19.6ms
479: learn: 0.0074764 total: 447ms remaining: 18.6ms
480: learn: 0.0074565 total: 448ms remaining: 17.7ms
481: learn: 0.0074448 total: 449ms remaining: 16.8ms
482: learn: 0.0074295 total: 449ms remaining: 15.8ms
483: learn: 0.0074159 total: 450ms remaining: 14.9ms
484: learn: 0.0074004 total: 451ms remaining: 13.9ms
485: learn: 0.0073754 total: 451ms remaining: 13ms
486: learn: 0.0073616 total: 452ms remaining: 12.1ms
487: learn: 0.0073481 total: 453ms remaining: 11.1ms
488: learn: 0.0073199 total: 454ms remaining: 10.2ms
489: learn: 0.0073043 total: 454ms remaining: 9.27ms
490: learn: 0.0072935 total: 455ms remaining: 8.34ms
491: learn: 0.0072780 total: 456ms remaining: 7.41ms
492: learn: 0.0072640 total: 456ms remaining: 6.48ms
493: learn: 0.0072470 total: 457ms remaining: 5.55ms
494: learn: 0.0072283 total: 458ms remaining: 4.62ms
495: learn: 0.0072170 total: 458ms remaining: 3.7ms
496: learn: 0.0072023 total: 459ms remaining: 2.77ms
497: learn: 0.0071900 total: 460ms remaining: 1.84ms
498: learn: 0.0071770 total: 460ms remaining: 922us
499: learn: 0.0071622 total: 461ms remaining: 0us
Evaluating the Model
To assess the model’s performance, we need to make predictions on the test set and compare them to the actual labels:
# Make predictions on the test data
y_pred = model.predict(X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy}")
Model Accuracy: 1.0
Visualizing the Results
Visualization is a powerful tool to comprehend your model’s performance. Let’s create a confusion matrix to visualize how well our model is doing:
from sklearn.metrics import confusion_matrix
import seaborn as sns
# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Visualize the confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
Feature Importance
CatBoost provides a straightforward way to determine feature importance, crucial for feature selection. Let’s visualize the importance of features in our model:
# Get feature importance
feature_importance = model.get_feature_importance(data=Pool(X_train, label=y_train), type='LossFunctionChange')
# Create a DataFrame to store feature names and their importance scores
feature_importance_df = pd.DataFrame({'Feature': X_train.columns, 'Importance': feature_importance})
# Sort the features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)
# Plot feature importance
plt.figure(figsize=(10, 6))
plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'], color='skyblue')
plt.xlabel('Feature Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.show()
Hyperparameter Tuning
CatBoost offers various hyperparameters for fine-tuning the model. Here’s an example of tuning the learning rate and the number of iterations:
# Hyperparameter tuning
params = {
'iterations': 1000,
'learning_rate': 0.05,
'depth': 6,
'loss_function': 'MultiClass',
}
tuned_model = CatBoostClassifier(**params)
tuned_model.fit(X_train, y_train)
Conclusion
CatBoost is a formidable addition to your Python machine learning toolkit, promising great results with minimal effort. Its efficient handling of categorical features, out-of-the-box performance, and robust model accuracy make it a compelling choice for various projects.
To master CatBoost, practice is key. Experiment with different datasets, hyperparameters, and techniques to unlock its full potential. In your journey to Python machine learning excellence, CatBoost will be your trusty companion, ready to take on challenging real-world problems with you.
So, continue your exploration, fine-tuning, and experimentation with CatBoost. You’re on the path to becoming a Python machine learning pro, and CatBoost is your shortcut to success. Happy learning and coding!
This guide has taken you from the fundamentals to advanced techniques of CatBoost in Python 3, with practical examples and plots. It’s been quite a journey, and you’re well on your way to becoming a pro in the Python machine learning world. Keep the curiosity alive, keep experimenting, and you’ll achieve greatness in no time.
Also, check out our other playlist Rasa Chatbot, Internet of things, Docker, Python Programming, MQTT, Tech News, ESP-IDF etc.
Become a member of our social family on youtube here.
Stay tuned and Happy Learning. âđ»đ
Happy coding! â€ïžđ„