Master CatBoost in Python 3: Your Guide to Machine Learning Excellence

Catboost in machine learning | Innovate Yourself
0
0

In the dynamic world of machine learning, Python stands as the driving force behind innovation, and a pro must wield the right tools. One such tool, CatBoost, has been quietly revolutionizing the field with its exceptional speed and accuracy. In this guide, we’ll dive deep into CatBoost in Python 3, covering the fundamentals, advanced techniques, and practical examples, including a hands-on demonstration with a sample dataset and plots. By the end, you’ll be well on your way to mastering CatBoost and achieving excellence in Python machine learning.

Unveiling CatBoost

CatBoost, short for Categorical Boosting, is a gradient boosting algorithm that’s creating ripples in the machine learning community. What sets CatBoost apart is its focus on categorical features and its ability to handle them seamlessly without any pre-processing. This makes it a game-changer for those working with real-world data where categorical variables are the norm.

CatBoost is a powerful gradient boosting algorithm for machine learning that has gained popularity for its ease of use, high predictive accuracy, and robustness in handling categorical features. It is particularly well-suited for classification and regression tasks. Here’s a more detailed explanation of CatBoost:

1. Gradient Boosting Algorithm:
CatBoost, short for Categorical Boosting, is based on the concept of gradient boosting. Gradient boosting is an ensemble learning technique that builds a predictive model by combining the predictions of multiple base models, typically decision trees. It works by optimizing a loss function to minimize prediction errors.

2. Handling Categorical Features:
One of CatBoost’s standout features is its ability to handle categorical features without the need for preprocessing. Many machine learning algorithms require one-hot encoding or label encoding for categorical data, which can be cumbersome and can lead to increased dimensionality. CatBoost, however, can directly work with categorical features, making it more convenient for real-world datasets where categorical variables are common.

3. Efficient Learning:
CatBoost is designed for efficiency and speed. It includes several optimization techniques that reduce overfitting and improve model training speed. These techniques include ordered boosting, oblivious trees, and the use of matrix factorization for feature combinations.

4. Regularization:
CatBoost incorporates L2 regularization, which helps control overfitting by adding a penalty term to the loss function. This regularization contributes to the model’s robustness and generalization.

5. Built-in Cross-Validation:
CatBoost simplifies the process of hyperparameter tuning and model selection by offering built-in cross-validation. This feature makes it easier to find the best set of hyperparameters for your specific dataset.

6. Default Parameter Tuning:
CatBoost is known for its well-tuned default hyperparameters. This means that even with minimal tuning, you can often achieve competitive results. This can be a time-saver for machine learning practitioners.

7. Support for Classification and Regression:
CatBoost is versatile and can be used for both classification and regression tasks. It can predict class labels and continuous values, making it suitable for a wide range of applications.

8. Integration with Popular Libraries:
CatBoost is well-integrated with popular Python libraries for data manipulation and analysis, such as Pandas and NumPy. This makes it easy to incorporate CatBoost into your existing machine learning workflow.

9. Model Interpretability:
While CatBoost is a powerful algorithm, it also provides tools for understanding model predictions. You can examine feature importances to determine which features are most influential in making predictions.

10. Active Development and Community Support:
CatBoost is actively developed and maintained by the community. You can find extensive documentation, tutorials, and community support to help you get started and solve any issues you may encounter.

In summary, CatBoost is a powerful and efficient gradient boosting algorithm that simplifies the handling of categorical features and provides strong out-of-the-box performance. It’s an excellent choice for both beginners and experienced data scientists working on a variety of machine learning tasks, and it has found applications in fields like finance, healthcare, and e-commerce, among others. If you’re looking for a reliable and user-friendly algorithm to boost your machine learning projects, CatBoost is a solid choice.

Why Choose CatBoost?

CatBoost offers several compelling reasons to be your go-to choice for machine learning projects:

  1. Categorical Features Handling: CatBoost can naturally handle categorical features without the need for one-hot encoding or label encoding. This simplifies the data preparation process and often leads to better results.
  2. Exceptional Speed: CatBoost is engineered for efficiency. It’s faster than many other gradient boosting algorithms, which is a big advantage when dealing with large datasets.
  3. Model Accuracy: Thanks to its robust handling of categorical features and robust regularization techniques, CatBoost often achieves excellent predictive accuracy.
  4. Built-in Cross-Validation: CatBoost comes with a built-in cross-validation method that simplifies model tuning and selection.
  5. Great Out-of-the-Box Performance: CatBoost’s default hyperparameters are well-tuned, making it an attractive choice for quick experimentation.

Getting Started with CatBoost

Before we embark on our journey into the world of CatBoost, let’s ensure you have Python 3.x installed on your system. You can install CatBoost using pip:

pip install catboost

With CatBoost installed, let’s import the necessary libraries to kickstart our learning journey:

import numpy as np
import pandas as pd
import catboost
import matplotlib.pyplot as plt
from catboost import CatBoostClassifier, Pool
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

The Dataset

For our hands-on exploration of CatBoost, we’ll use the Iris dataset, a classic dataset in the world of machine learning. This dataset consists of features for three different species of iris flowers. Let’s load the Iris dataset and take a peek at the first few rows:

from sklearn.datasets import load_iris

iris = load_iris(as_frame=True)
df = iris.frame
print(df.head())
    sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  target
0                5.1               3.5                1.4               0.2       0
1                4.9               3.0                1.4               0.2       0
2                4.7               3.2                1.3               0.2       0
3                4.6               3.1                1.5               0.2       0
4                5.0               3.6                1.4               0.2       0

Data Exploration

Data exploration is the starting point for any machine learning project. It helps us understand the data’s characteristics. For the Iris dataset, let’s begin with basic statistics:

print(df.describe())
       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)      target
count         150.000000        150.000000         150.000000        150.000000  150.000000
mean            5.843333          3.057333           3.758000          1.199333    1.000000
std             0.828066          0.435866           1.765298          0.762238    0.819232
min             4.300000          2.000000           1.000000          0.100000    0.000000
25%             5.100000          2.800000           1.600000          0.300000    0.000000
50%             5.800000          3.000000           4.350000          1.300000    1.000000
75%             6.400000          3.300000           5.100000          1.800000    2.000000
max             7.900000          4.400000           6.900000          2.500000    2.000000

Data Preprocessing

Before we can work with the data in CatBoost, we need to handle missing values, encode categorical features (if any), and split the dataset into training and testing sets. Let’s tackle these steps:

# Handle missing values if any
df.dropna(inplace=True)

# Split the data into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Building a CatBoost Model

Now that our data is preprocessed, let’s create a CatBoost model. We’ll start with a basic model configuration:

# Create a CatBoost classifier
model = CatBoostClassifier(iterations=500, depth=6, learning_rate=0.1, loss_function='MultiClass')

# Fit the model on the training data
model.fit(X_train, y_train)
0:      learn: 0.9813365        total: 137ms    remaining: 1m 8s
1:      learn: 0.8861315        total: 139ms    remaining: 34.5s
2:      learn: 0.8058427        total: 140ms    remaining: 23.3s
3:      learn: 0.7358212        total: 142ms    remaining: 17.6s
4:      learn: 0.6856585        total: 144ms    remaining: 14.2s
5:      learn: 0.6277489        total: 146ms    remaining: 12s
6:      learn: 0.5872592        total: 147ms    remaining: 10.4s
7:      learn: 0.5521121        total: 148ms    remaining: 9.1s
8:      learn: 0.5150186        total: 149ms    remaining: 8.12s
9:      learn: 0.4854003        total: 150ms    remaining: 7.33s
10:     learn: 0.4546241        total: 150ms    remaining: 6.68s
11:     learn: 0.4274622        total: 151ms    remaining: 6.15s
12:     learn: 0.4034791        total: 152ms    remaining: 5.7s
13:     learn: 0.3769770        total: 153ms    remaining: 5.29s
14:     learn: 0.3576667        total: 153ms    remaining: 4.96s
15:     learn: 0.3386418        total: 155ms    remaining: 4.7s
16:     learn: 0.3226843        total: 156ms    remaining: 4.44s
17:     learn: 0.3104870        total: 157ms    remaining: 4.22s
18:     learn: 0.2956921        total: 158ms    remaining: 4.01s
19:     learn: 0.2829591        total: 159ms    remaining: 3.82s
20:     learn: 0.2699720        total: 160ms    remaining: 3.65s
21:     learn: 0.2590053        total: 161ms    remaining: 3.5s
22:     learn: 0.2488321        total: 162ms    remaining: 3.35s
23:     learn: 0.2395776        total: 162ms    remaining: 3.22s
24:     learn: 0.2302715        total: 163ms    remaining: 3.1s
25:     learn: 0.2234203        total: 164ms    remaining: 2.98s
26:     learn: 0.2144221        total: 165ms    remaining: 2.88s
27:     learn: 0.2079915        total: 165ms    remaining: 2.79s
28:     learn: 0.2005255        total: 166ms    remaining: 2.69s
29:     learn: 0.1937367        total: 167ms    remaining: 2.61s
30:     learn: 0.1860074        total: 167ms    remaining: 2.53s
31:     learn: 0.1806887        total: 168ms    remaining: 2.46s
32:     learn: 0.1746325        total: 169ms    remaining: 2.39s
33:     learn: 0.1703708        total: 170ms    remaining: 2.33s
34:     learn: 0.1648839        total: 171ms    remaining: 2.27s
35:     learn: 0.1607991        total: 171ms    remaining: 2.21s
36:     learn: 0.1576853        total: 172ms    remaining: 2.15s
37:     learn: 0.1521127        total: 173ms    remaining: 2.1s
38:     learn: 0.1481313        total: 174ms    remaining: 2.05s
39:     learn: 0.1443709        total: 175ms    remaining: 2.01s
40:     learn: 0.1407797        total: 175ms    remaining: 1.96s
41:     learn: 0.1372416        total: 176ms    remaining: 1.92s
42:     learn: 0.1335051        total: 177ms    remaining: 1.88s
43:     learn: 0.1299110        total: 177ms    remaining: 1.84s
44:     learn: 0.1269549        total: 178ms    remaining: 1.8s
45:     learn: 0.1240262        total: 179ms    remaining: 1.76s
46:     learn: 0.1210560        total: 179ms    remaining: 1.73s
47:     learn: 0.1184652        total: 180ms    remaining: 1.7s
48:     learn: 0.1164967        total: 181ms    remaining: 1.66s
49:     learn: 0.1141904        total: 181ms    remaining: 1.63s
50:     learn: 0.1118333        total: 182ms    remaining: 1.6s
51:     learn: 0.1099596        total: 183ms    remaining: 1.57s
52:     learn: 0.1079690        total: 184ms    remaining: 1.55s
53:     learn: 0.1055933        total: 184ms    remaining: 1.52s
54:     learn: 0.1040959        total: 185ms    remaining: 1.5s
55:     learn: 0.1012368        total: 186ms    remaining: 1.47s
56:     learn: 0.0984991        total: 186ms    remaining: 1.45s
57:     learn: 0.0967660        total: 187ms    remaining: 1.42s
58:     learn: 0.0947463        total: 187ms    remaining: 1.4s
59:     learn: 0.0934101        total: 188ms    remaining: 1.38s
60:     learn: 0.0917338        total: 189ms    remaining: 1.36s
61:     learn: 0.0899675        total: 189ms    remaining: 1.34s
62:     learn: 0.0873840        total: 190ms    remaining: 1.32s
63:     learn: 0.0854522        total: 191ms    remaining: 1.3s
64:     learn: 0.0841932        total: 191ms    remaining: 1.28s
65:     learn: 0.0830201        total: 192ms    remaining: 1.26s
66:     learn: 0.0813040        total: 193ms    remaining: 1.24s
67:     learn: 0.0797996        total: 193ms    remaining: 1.23s
68:     learn: 0.0784237        total: 194ms    remaining: 1.21s
69:     learn: 0.0767871        total: 195ms    remaining: 1.2s
70:     learn: 0.0753030        total: 196ms    remaining: 1.18s
71:     learn: 0.0741743        total: 197ms    remaining: 1.17s
72:     learn: 0.0729802        total: 197ms    remaining: 1.15s
73:     learn: 0.0718280        total: 198ms    remaining: 1.14s
74:     learn: 0.0705972        total: 199ms    remaining: 1.13s
75:     learn: 0.0694299        total: 199ms    remaining: 1.11s
76:     learn: 0.0684781        total: 200ms    remaining: 1.1s
77:     learn: 0.0672792        total: 201ms    remaining: 1.08s
78:     learn: 0.0659845        total: 201ms    remaining: 1.07s
79:     learn: 0.0650208        total: 202ms    remaining: 1.06s
80:     learn: 0.0637061        total: 202ms    remaining: 1.05s
81:     learn: 0.0627809        total: 203ms    remaining: 1.03s
82:     learn: 0.0617096        total: 204ms    remaining: 1.02s
83:     learn: 0.0610447        total: 205ms    remaining: 1.01s
84:     learn: 0.0602128        total: 205ms    remaining: 1s
85:     learn: 0.0592277        total: 206ms    remaining: 991ms
86:     learn: 0.0583512        total: 206ms    remaining: 980ms
87:     learn: 0.0573394        total: 207ms    remaining: 969ms
88:     learn: 0.0567103        total: 208ms    remaining: 960ms
89:     learn: 0.0558138        total: 209ms    remaining: 951ms
90:     learn: 0.0550342        total: 210ms    remaining: 942ms
91:     learn: 0.0543715        total: 210ms    remaining: 932ms
92:     learn: 0.0538101        total: 211ms    remaining: 922ms
93:     learn: 0.0531655        total: 211ms    remaining: 913ms
94:     learn: 0.0523585        total: 212ms    remaining: 904ms
95:     learn: 0.0515770        total: 213ms    remaining: 895ms
96:     learn: 0.0509992        total: 213ms    remaining: 887ms
97:     learn: 0.0503180        total: 214ms    remaining: 879ms
98:     learn: 0.0497169        total: 215ms    remaining: 871ms
99:     learn: 0.0489591        total: 216ms    remaining: 863ms
100:    learn: 0.0482433        total: 216ms    remaining: 855ms
101:    learn: 0.0473777        total: 217ms    remaining: 848ms
102:    learn: 0.0467162        total: 218ms    remaining: 840ms
103:    learn: 0.0461578        total: 219ms    remaining: 833ms
104:    learn: 0.0456718        total: 220ms    remaining: 826ms
105:    learn: 0.0450778        total: 220ms    remaining: 819ms
106:    learn: 0.0447540        total: 221ms    remaining: 812ms
107:    learn: 0.0441458        total: 222ms    remaining: 806ms
108:    learn: 0.0436147        total: 223ms    remaining: 800ms
109:    learn: 0.0431599        total: 224ms    remaining: 793ms
110:    learn: 0.0426456        total: 224ms    remaining: 786ms
111:    learn: 0.0421912        total: 225ms    remaining: 779ms
112:    learn: 0.0418751        total: 226ms    remaining: 773ms
113:    learn: 0.0414768        total: 226ms    remaining: 767ms
114:    learn: 0.0411219        total: 227ms    remaining: 760ms
115:    learn: 0.0405597        total: 228ms    remaining: 754ms
116:    learn: 0.0400836        total: 228ms    remaining: 748ms
117:    learn: 0.0396947        total: 229ms    remaining: 742ms
118:    learn: 0.0392113        total: 230ms    remaining: 736ms
119:    learn: 0.0388474        total: 230ms    remaining: 730ms
120:    learn: 0.0384401        total: 231ms    remaining: 724ms
121:    learn: 0.0378449        total: 232ms    remaining: 718ms
122:    learn: 0.0373931        total: 232ms    remaining: 712ms
123:    learn: 0.0369356        total: 233ms    remaining: 706ms
124:    learn: 0.0365907        total: 234ms    remaining: 701ms
125:    learn: 0.0362963        total: 234ms    remaining: 696ms
126:    learn: 0.0360159        total: 235ms    remaining: 691ms
127:    learn: 0.0355305        total: 236ms    remaining: 686ms
128:    learn: 0.0351991        total: 237ms    remaining: 680ms
129:    learn: 0.0348148        total: 237ms    remaining: 675ms
130:    learn: 0.0345883        total: 238ms    remaining: 670ms
131:    learn: 0.0342270        total: 239ms    remaining: 665ms
132:    learn: 0.0339722        total: 239ms    remaining: 660ms
133:    learn: 0.0336862        total: 240ms    remaining: 655ms
134:    learn: 0.0333749        total: 240ms    remaining: 650ms
135:    learn: 0.0331462        total: 241ms    remaining: 645ms
136:    learn: 0.0328141        total: 242ms    remaining: 640ms
137:    learn: 0.0324689        total: 242ms    remaining: 635ms
138:    learn: 0.0321608        total: 243ms    remaining: 631ms
139:    learn: 0.0319591        total: 243ms    remaining: 626ms
140:    learn: 0.0316437        total: 244ms    remaining: 621ms
141:    learn: 0.0314192        total: 245ms    remaining: 617ms
142:    learn: 0.0311874        total: 245ms    remaining: 612ms
143:    learn: 0.0310029        total: 246ms    remaining: 607ms
144:    learn: 0.0307442        total: 246ms    remaining: 603ms
145:    learn: 0.0304999        total: 247ms    remaining: 599ms
146:    learn: 0.0301921        total: 248ms    remaining: 595ms
147:    learn: 0.0299496        total: 249ms    remaining: 591ms
148:    learn: 0.0297355        total: 249ms    remaining: 587ms
149:    learn: 0.0294643        total: 250ms    remaining: 583ms
150:    learn: 0.0292235        total: 250ms    remaining: 579ms
151:    learn: 0.0289936        total: 251ms    remaining: 575ms
152:    learn: 0.0287927        total: 252ms    remaining: 571ms
153:    learn: 0.0285373        total: 252ms    remaining: 567ms
154:    learn: 0.0282880        total: 253ms    remaining: 562ms
155:    learn: 0.0280995        total: 253ms    remaining: 558ms
156:    learn: 0.0278874        total: 254ms    remaining: 554ms
157:    learn: 0.0276427        total: 254ms    remaining: 551ms
158:    learn: 0.0274531        total: 255ms    remaining: 547ms
159:    learn: 0.0271323        total: 255ms    remaining: 543ms
160:    learn: 0.0269017        total: 256ms    remaining: 539ms
161:    learn: 0.0266901        total: 257ms    remaining: 535ms
162:    learn: 0.0265159        total: 257ms    remaining: 531ms
163:    learn: 0.0263174        total: 258ms    remaining: 528ms
164:    learn: 0.0261641        total: 258ms    remaining: 524ms
165:    learn: 0.0260273        total: 259ms    remaining: 520ms
166:    learn: 0.0258745        total: 259ms    remaining: 517ms
167:    learn: 0.0256712        total: 260ms    remaining: 513ms
168:    learn: 0.0254837        total: 260ms    remaining: 510ms
169:    learn: 0.0253326        total: 261ms    remaining: 506ms
170:    learn: 0.0251841        total: 262ms    remaining: 503ms
171:    learn: 0.0249344        total: 262ms    remaining: 500ms
172:    learn: 0.0247762        total: 263ms    remaining: 497ms
173:    learn: 0.0246360        total: 264ms    remaining: 494ms
174:    learn: 0.0244494        total: 264ms    remaining: 491ms
175:    learn: 0.0243041        total: 265ms    remaining: 488ms
176:    learn: 0.0241254        total: 265ms    remaining: 484ms
177:    learn: 0.0240198        total: 266ms    remaining: 481ms
178:    learn: 0.0238266        total: 267ms    remaining: 478ms
179:    learn: 0.0236839        total: 267ms    remaining: 475ms
180:    learn: 0.0234980        total: 268ms    remaining: 472ms
181:    learn: 0.0233823        total: 268ms    remaining: 469ms
182:    learn: 0.0232332        total: 269ms    remaining: 466ms
183:    learn: 0.0230969        total: 269ms    remaining: 463ms
184:    learn: 0.0229568        total: 270ms    remaining: 460ms
185:    learn: 0.0227912        total: 270ms    remaining: 456ms
186:    learn: 0.0226332        total: 271ms    remaining: 453ms
187:    learn: 0.0224962        total: 271ms    remaining: 450ms
188:    learn: 0.0223438        total: 272ms    remaining: 448ms
189:    learn: 0.0221629        total: 273ms    remaining: 445ms
190:    learn: 0.0220155        total: 273ms    remaining: 442ms
191:    learn: 0.0218276        total: 274ms    remaining: 439ms
192:    learn: 0.0217185        total: 274ms    remaining: 436ms
193:    learn: 0.0216028        total: 275ms    remaining: 434ms
194:    learn: 0.0214985        total: 276ms    remaining: 431ms
195:    learn: 0.0213761        total: 277ms    remaining: 429ms
196:    learn: 0.0212865        total: 277ms    remaining: 427ms
197:    learn: 0.0211358        total: 278ms    remaining: 424ms
198:    learn: 0.0210420        total: 279ms    remaining: 421ms
199:    learn: 0.0209234        total: 279ms    remaining: 419ms
200:    learn: 0.0208260        total: 280ms    remaining: 416ms
201:    learn: 0.0207213        total: 280ms    remaining: 413ms
202:    learn: 0.0205647        total: 281ms    remaining: 411ms
203:    learn: 0.0204304        total: 281ms    remaining: 408ms
204:    learn: 0.0203341        total: 282ms    remaining: 405ms
205:    learn: 0.0202026        total: 282ms    remaining: 403ms
206:    learn: 0.0200774        total: 283ms    remaining: 400ms
207:    learn: 0.0199344        total: 283ms    remaining: 398ms
208:    learn: 0.0198113        total: 284ms    remaining: 395ms
209:    learn: 0.0197190        total: 284ms    remaining: 393ms
210:    learn: 0.0196105        total: 285ms    remaining: 390ms
211:    learn: 0.0194789        total: 286ms    remaining: 388ms
212:    learn: 0.0193919        total: 286ms    remaining: 385ms
213:    learn: 0.0193191        total: 287ms    remaining: 383ms
214:    learn: 0.0192235        total: 287ms    remaining: 381ms
215:    learn: 0.0191393        total: 288ms    remaining: 378ms
216:    learn: 0.0190541        total: 288ms    remaining: 376ms
217:    learn: 0.0189043        total: 289ms    remaining: 374ms
218:    learn: 0.0188042        total: 290ms    remaining: 372ms
219:    learn: 0.0186925        total: 291ms    remaining: 370ms
220:    learn: 0.0185912        total: 291ms    remaining: 368ms
221:    learn: 0.0184455        total: 292ms    remaining: 365ms
222:    learn: 0.0183640        total: 292ms    remaining: 363ms
223:    learn: 0.0182678        total: 293ms    remaining: 361ms
224:    learn: 0.0181968        total: 293ms    remaining: 359ms
225:    learn: 0.0181030        total: 294ms    remaining: 356ms
226:    learn: 0.0180141        total: 295ms    remaining: 354ms
227:    learn: 0.0179371        total: 295ms    remaining: 352ms
228:    learn: 0.0178583        total: 296ms    remaining: 350ms
229:    learn: 0.0177846        total: 296ms    remaining: 348ms
230:    learn: 0.0176598        total: 297ms    remaining: 345ms
231:    learn: 0.0175682        total: 297ms    remaining: 343ms
232:    learn: 0.0174190        total: 298ms    remaining: 341ms
233:    learn: 0.0173246        total: 298ms    remaining: 339ms
234:    learn: 0.0172594        total: 299ms    remaining: 337ms
235:    learn: 0.0171925        total: 299ms    remaining: 335ms
236:    learn: 0.0170918        total: 300ms    remaining: 333ms
237:    learn: 0.0169931        total: 300ms    remaining: 331ms
238:    learn: 0.0169307        total: 301ms    remaining: 329ms
239:    learn: 0.0168460        total: 302ms    remaining: 327ms
240:    learn: 0.0167919        total: 302ms    remaining: 325ms
241:    learn: 0.0166981        total: 303ms    remaining: 323ms
242:    learn: 0.0166429        total: 304ms    remaining: 321ms
243:    learn: 0.0165717        total: 305ms    remaining: 320ms
244:    learn: 0.0165008        total: 305ms    remaining: 318ms
245:    learn: 0.0164158        total: 306ms    remaining: 316ms
246:    learn: 0.0163351        total: 306ms    remaining: 314ms
247:    learn: 0.0162674        total: 307ms    remaining: 312ms
248:    learn: 0.0161968        total: 307ms    remaining: 310ms
249:    learn: 0.0161346        total: 308ms    remaining: 308ms
250:    learn: 0.0160478        total: 308ms    remaining: 306ms
251:    learn: 0.0159925        total: 309ms    remaining: 304ms
252:    learn: 0.0159286        total: 309ms    remaining: 302ms
253:    learn: 0.0158469        total: 310ms    remaining: 300ms
254:    learn: 0.0157394        total: 310ms    remaining: 298ms
255:    learn: 0.0156765        total: 311ms    remaining: 296ms
256:    learn: 0.0155879        total: 311ms    remaining: 294ms
257:    learn: 0.0155219        total: 312ms    remaining: 293ms
258:    learn: 0.0154783        total: 313ms    remaining: 291ms
259:    learn: 0.0154224        total: 313ms    remaining: 289ms
260:    learn: 0.0153621        total: 314ms    remaining: 287ms
261:    learn: 0.0152921        total: 314ms    remaining: 285ms
262:    learn: 0.0151938        total: 315ms    remaining: 284ms
263:    learn: 0.0150795        total: 315ms    remaining: 282ms
264:    learn: 0.0150278        total: 316ms    remaining: 280ms
265:    learn: 0.0149744        total: 317ms    remaining: 279ms
266:    learn: 0.0149082        total: 318ms    remaining: 277ms
267:    learn: 0.0148362        total: 319ms    remaining: 276ms
268:    learn: 0.0147671        total: 319ms    remaining: 274ms
269:    learn: 0.0146938        total: 320ms    remaining: 272ms
270:    learn: 0.0145987        total: 320ms    remaining: 271ms
271:    learn: 0.0145527        total: 321ms    remaining: 269ms
272:    learn: 0.0144639        total: 321ms    remaining: 267ms
273:    learn: 0.0144124        total: 322ms    remaining: 266ms
274:    learn: 0.0143478        total: 322ms    remaining: 264ms
275:    learn: 0.0142858        total: 323ms    remaining: 262ms
276:    learn: 0.0142283        total: 324ms    remaining: 261ms
277:    learn: 0.0141439        total: 324ms    remaining: 259ms
278:    learn: 0.0140840        total: 325ms    remaining: 257ms
279:    learn: 0.0140204        total: 325ms    remaining: 256ms
280:    learn: 0.0139567        total: 326ms    remaining: 254ms
281:    learn: 0.0139085        total: 326ms    remaining: 252ms
282:    learn: 0.0138675        total: 327ms    remaining: 251ms
283:    learn: 0.0138165        total: 327ms    remaining: 249ms
284:    learn: 0.0137702        total: 328ms    remaining: 247ms
285:    learn: 0.0137259        total: 328ms    remaining: 246ms
286:    learn: 0.0136672        total: 329ms    remaining: 244ms
287:    learn: 0.0136176        total: 330ms    remaining: 243ms
288:    learn: 0.0135746        total: 330ms    remaining: 241ms
289:    learn: 0.0135198        total: 331ms    remaining: 240ms
290:    learn: 0.0134622        total: 332ms    remaining: 238ms
291:    learn: 0.0134201        total: 333ms    remaining: 237ms
292:    learn: 0.0133689        total: 333ms    remaining: 236ms
293:    learn: 0.0132793        total: 334ms    remaining: 234ms
294:    learn: 0.0132402        total: 335ms    remaining: 233ms
295:    learn: 0.0131801        total: 335ms    remaining: 231ms
296:    learn: 0.0131457        total: 336ms    remaining: 230ms
297:    learn: 0.0130898        total: 336ms    remaining: 228ms
298:    learn: 0.0130593        total: 337ms    remaining: 227ms
299:    learn: 0.0130122        total: 338ms    remaining: 225ms
300:    learn: 0.0129658        total: 338ms    remaining: 224ms
301:    learn: 0.0128880        total: 339ms    remaining: 222ms
302:    learn: 0.0128276        total: 339ms    remaining: 221ms
303:    learn: 0.0127843        total: 340ms    remaining: 219ms
304:    learn: 0.0127458        total: 340ms    remaining: 218ms
305:    learn: 0.0126691        total: 341ms    remaining: 216ms
306:    learn: 0.0126114        total: 342ms    remaining: 215ms
307:    learn: 0.0125656        total: 342ms    remaining: 213ms
308:    learn: 0.0125126        total: 343ms    remaining: 212ms
309:    learn: 0.0124746        total: 343ms    remaining: 210ms
310:    learn: 0.0124365        total: 344ms    remaining: 209ms
311:    learn: 0.0124002        total: 345ms    remaining: 208ms
312:    learn: 0.0123667        total: 346ms    remaining: 207ms
313:    learn: 0.0123186        total: 346ms    remaining: 205ms
314:    learn: 0.0122814        total: 347ms    remaining: 204ms
315:    learn: 0.0122297        total: 348ms    remaining: 202ms
316:    learn: 0.0121733        total: 348ms    remaining: 201ms
317:    learn: 0.0121371        total: 349ms    remaining: 200ms
318:    learn: 0.0120900        total: 349ms    remaining: 198ms
319:    learn: 0.0120464        total: 350ms    remaining: 197ms
320:    learn: 0.0119967        total: 350ms    remaining: 195ms
321:    learn: 0.0119658        total: 351ms    remaining: 194ms
322:    learn: 0.0119245        total: 351ms    remaining: 193ms
323:    learn: 0.0118890        total: 352ms    remaining: 191ms
324:    learn: 0.0118487        total: 353ms    remaining: 190ms
325:    learn: 0.0118105        total: 353ms    remaining: 188ms
326:    learn: 0.0117515        total: 354ms    remaining: 187ms
327:    learn: 0.0116953        total: 354ms    remaining: 186ms
328:    learn: 0.0116502        total: 355ms    remaining: 184ms
329:    learn: 0.0115900        total: 355ms    remaining: 183ms
330:    learn: 0.0115462        total: 356ms    remaining: 182ms
331:    learn: 0.0114960        total: 356ms    remaining: 180ms
332:    learn: 0.0114644        total: 357ms    remaining: 179ms
333:    learn: 0.0114374        total: 358ms    remaining: 178ms
334:    learn: 0.0113922        total: 359ms    remaining: 177ms
335:    learn: 0.0113605        total: 360ms    remaining: 176ms
336:    learn: 0.0113282        total: 361ms    remaining: 174ms
337:    learn: 0.0112944        total: 361ms    remaining: 173ms
338:    learn: 0.0112669        total: 362ms    remaining: 172ms
339:    learn: 0.0112261        total: 362ms    remaining: 170ms
340:    learn: 0.0111866        total: 363ms    remaining: 169ms
341:    learn: 0.0111513        total: 363ms    remaining: 168ms
342:    learn: 0.0111159        total: 364ms    remaining: 167ms
343:    learn: 0.0110875        total: 364ms    remaining: 165ms
344:    learn: 0.0110485        total: 365ms    remaining: 164ms
345:    learn: 0.0110119        total: 365ms    remaining: 163ms
346:    learn: 0.0109689        total: 366ms    remaining: 161ms
347:    learn: 0.0109289        total: 367ms    remaining: 160ms
348:    learn: 0.0108718        total: 367ms    remaining: 159ms
349:    learn: 0.0108328        total: 368ms    remaining: 158ms
350:    learn: 0.0107850        total: 368ms    remaining: 156ms
351:    learn: 0.0107481        total: 369ms    remaining: 155ms
352:    learn: 0.0107189        total: 369ms    remaining: 154ms
353:    learn: 0.0106873        total: 370ms    remaining: 153ms
354:    learn: 0.0106450        total: 370ms    remaining: 151ms
355:    learn: 0.0106054        total: 371ms    remaining: 150ms
356:    learn: 0.0105746        total: 372ms    remaining: 149ms
357:    learn: 0.0105430        total: 373ms    remaining: 148ms
358:    learn: 0.0105066        total: 374ms    remaining: 147ms
359:    learn: 0.0104587        total: 374ms    remaining: 146ms
360:    learn: 0.0104265        total: 375ms    remaining: 144ms
361:    learn: 0.0103897        total: 375ms    remaining: 143ms
362:    learn: 0.0103327        total: 376ms    remaining: 142ms
363:    learn: 0.0102953        total: 377ms    remaining: 141ms
364:    learn: 0.0102651        total: 377ms    remaining: 139ms
365:    learn: 0.0102346        total: 378ms    remaining: 138ms
366:    learn: 0.0102045        total: 378ms    remaining: 137ms
367:    learn: 0.0101695        total: 379ms    remaining: 136ms
368:    learn: 0.0101292        total: 379ms    remaining: 135ms
369:    learn: 0.0101072        total: 380ms    remaining: 134ms
370:    learn: 0.0100859        total: 381ms    remaining: 132ms
371:    learn: 0.0100314        total: 381ms    remaining: 131ms
372:    learn: 0.0099937        total: 382ms    remaining: 130ms
373:    learn: 0.0099609        total: 382ms    remaining: 129ms
374:    learn: 0.0099316        total: 383ms    remaining: 128ms
375:    learn: 0.0099035        total: 384ms    remaining: 126ms
376:    learn: 0.0098627        total: 384ms    remaining: 125ms
377:    learn: 0.0098270        total: 385ms    remaining: 124ms
378:    learn: 0.0098028        total: 386ms    remaining: 123ms
379:    learn: 0.0097738        total: 386ms    remaining: 122ms
380:    learn: 0.0097466        total: 387ms    remaining: 121ms
381:    learn: 0.0097135        total: 387ms    remaining: 120ms
382:    learn: 0.0096823        total: 388ms    remaining: 119ms
383:    learn: 0.0096481        total: 389ms    remaining: 117ms
384:    learn: 0.0096248        total: 389ms    remaining: 116ms
385:    learn: 0.0095828        total: 390ms    remaining: 115ms
386:    learn: 0.0095524        total: 390ms    remaining: 114ms
387:    learn: 0.0095322        total: 391ms    remaining: 113ms
388:    learn: 0.0095085        total: 391ms    remaining: 112ms
389:    learn: 0.0094847        total: 392ms    remaining: 111ms
390:    learn: 0.0094581        total: 392ms    remaining: 109ms
391:    learn: 0.0094356        total: 393ms    remaining: 108ms
392:    learn: 0.0094075        total: 394ms    remaining: 107ms
393:    learn: 0.0093884        total: 394ms    remaining: 106ms
394:    learn: 0.0093578        total: 395ms    remaining: 105ms
395:    learn: 0.0093364        total: 395ms    remaining: 104ms
396:    learn: 0.0093134        total: 396ms    remaining: 103ms
397:    learn: 0.0092902        total: 396ms    remaining: 102ms
398:    learn: 0.0092683        total: 397ms    remaining: 100ms
399:    learn: 0.0092486        total: 397ms    remaining: 99.4ms
400:    learn: 0.0092262        total: 398ms    remaining: 98.3ms
401:    learn: 0.0091973        total: 399ms    remaining: 97.3ms
402:    learn: 0.0091612        total: 400ms    remaining: 96.2ms
403:    learn: 0.0091296        total: 400ms    remaining: 95.1ms
404:    learn: 0.0090974        total: 401ms    remaining: 94ms
405:    learn: 0.0090669        total: 401ms    remaining: 92.9ms
406:    learn: 0.0090271        total: 402ms    remaining: 91.8ms
407:    learn: 0.0090075        total: 402ms    remaining: 90.7ms
408:    learn: 0.0089826        total: 403ms    remaining: 89.7ms
409:    learn: 0.0089578        total: 404ms    remaining: 88.6ms
410:    learn: 0.0089370        total: 404ms    remaining: 87.5ms
411:    learn: 0.0089122        total: 405ms    remaining: 86.5ms
412:    learn: 0.0088833        total: 405ms    remaining: 85.4ms
413:    learn: 0.0088605        total: 406ms    remaining: 84.3ms
414:    learn: 0.0088282        total: 407ms    remaining: 83.3ms
415:    learn: 0.0088085        total: 407ms    remaining: 82.2ms
416:    learn: 0.0087789        total: 408ms    remaining: 81.1ms
417:    learn: 0.0087506        total: 408ms    remaining: 80.1ms
418:    learn: 0.0087325        total: 409ms    remaining: 79ms
419:    learn: 0.0086985        total: 409ms    remaining: 78ms
420:    learn: 0.0086767        total: 410ms    remaining: 76.9ms
421:    learn: 0.0086540        total: 410ms    remaining: 75.8ms
422:    learn: 0.0086300        total: 411ms    remaining: 74.8ms
423:    learn: 0.0086089        total: 412ms    remaining: 73.8ms
424:    learn: 0.0085840        total: 412ms    remaining: 72.8ms
425:    learn: 0.0085631        total: 413ms    remaining: 71.7ms
426:    learn: 0.0085330        total: 414ms    remaining: 70.8ms
427:    learn: 0.0085071        total: 415ms    remaining: 69.7ms
428:    learn: 0.0084802        total: 415ms    remaining: 68.7ms
429:    learn: 0.0084636        total: 416ms    remaining: 67.7ms
430:    learn: 0.0084437        total: 417ms    remaining: 66.7ms
431:    learn: 0.0084275        total: 417ms    remaining: 65.7ms
432:    learn: 0.0084027        total: 418ms    remaining: 64.7ms
433:    learn: 0.0083845        total: 419ms    remaining: 63.7ms
434:    learn: 0.0083685        total: 419ms    remaining: 62.6ms
435:    learn: 0.0083451        total: 420ms    remaining: 61.6ms
436:    learn: 0.0083168        total: 420ms    remaining: 60.6ms
437:    learn: 0.0082985        total: 421ms    remaining: 59.6ms
438:    learn: 0.0082707        total: 421ms    remaining: 58.5ms
439:    learn: 0.0082491        total: 422ms    remaining: 57.5ms
440:    learn: 0.0082219        total: 422ms    remaining: 56.5ms
441:    learn: 0.0081919        total: 423ms    remaining: 55.5ms
442:    learn: 0.0081731        total: 423ms    remaining: 54.5ms
443:    learn: 0.0081589        total: 424ms    remaining: 53.5ms
444:    learn: 0.0081403        total: 424ms    remaining: 52.4ms
445:    learn: 0.0081171        total: 425ms    remaining: 51.5ms
446:    learn: 0.0081011        total: 426ms    remaining: 50.5ms
447:    learn: 0.0080789        total: 427ms    remaining: 49.5ms
448:    learn: 0.0080554        total: 427ms    remaining: 48.5ms
449:    learn: 0.0080404        total: 428ms    remaining: 47.5ms
450:    learn: 0.0080171        total: 429ms    remaining: 46.6ms
451:    learn: 0.0079952        total: 429ms    remaining: 45.6ms
452:    learn: 0.0079670        total: 430ms    remaining: 44.6ms
453:    learn: 0.0079420        total: 430ms    remaining: 43.6ms
454:    learn: 0.0079225        total: 431ms    remaining: 42.6ms
455:    learn: 0.0078918        total: 432ms    remaining: 41.6ms
456:    learn: 0.0078744        total: 432ms    remaining: 40.7ms
457:    learn: 0.0078534        total: 433ms    remaining: 39.7ms
458:    learn: 0.0078382        total: 433ms    remaining: 38.7ms
459:    learn: 0.0078107        total: 434ms    remaining: 37.7ms
460:    learn: 0.0077943        total: 434ms    remaining: 36.7ms
461:    learn: 0.0077754        total: 435ms    remaining: 35.8ms
462:    learn: 0.0077596        total: 435ms    remaining: 34.8ms
463:    learn: 0.0077402        total: 436ms    remaining: 33.8ms
464:    learn: 0.0077274        total: 436ms    remaining: 32.8ms
465:    learn: 0.0077073        total: 437ms    remaining: 31.9ms
466:    learn: 0.0076962        total: 437ms    remaining: 30.9ms
467:    learn: 0.0076773        total: 438ms    remaining: 29.9ms
468:    learn: 0.0076627        total: 438ms    remaining: 29ms
469:    learn: 0.0076465        total: 439ms    remaining: 28ms
470:    learn: 0.0076316        total: 440ms    remaining: 27.1ms
471:    learn: 0.0076157        total: 441ms    remaining: 26.1ms
472:    learn: 0.0075998        total: 442ms    remaining: 25.2ms
473:    learn: 0.0075819        total: 443ms    remaining: 24.3ms
474:    learn: 0.0075681        total: 444ms    remaining: 23.3ms
475:    learn: 0.0075496        total: 444ms    remaining: 22.4ms
476:    learn: 0.0075307        total: 445ms    remaining: 21.5ms
477:    learn: 0.0075120        total: 446ms    remaining: 20.5ms
478:    learn: 0.0074922        total: 447ms    remaining: 19.6ms
479:    learn: 0.0074764        total: 447ms    remaining: 18.6ms
480:    learn: 0.0074565        total: 448ms    remaining: 17.7ms
481:    learn: 0.0074448        total: 449ms    remaining: 16.8ms
482:    learn: 0.0074295        total: 449ms    remaining: 15.8ms
483:    learn: 0.0074159        total: 450ms    remaining: 14.9ms
484:    learn: 0.0074004        total: 451ms    remaining: 13.9ms
485:    learn: 0.0073754        total: 451ms    remaining: 13ms
486:    learn: 0.0073616        total: 452ms    remaining: 12.1ms
487:    learn: 0.0073481        total: 453ms    remaining: 11.1ms
488:    learn: 0.0073199        total: 454ms    remaining: 10.2ms
489:    learn: 0.0073043        total: 454ms    remaining: 9.27ms
490:    learn: 0.0072935        total: 455ms    remaining: 8.34ms
491:    learn: 0.0072780        total: 456ms    remaining: 7.41ms
492:    learn: 0.0072640        total: 456ms    remaining: 6.48ms
493:    learn: 0.0072470        total: 457ms    remaining: 5.55ms
494:    learn: 0.0072283        total: 458ms    remaining: 4.62ms
495:    learn: 0.0072170        total: 458ms    remaining: 3.7ms
496:    learn: 0.0072023        total: 459ms    remaining: 2.77ms
497:    learn: 0.0071900        total: 460ms    remaining: 1.84ms
498:    learn: 0.0071770        total: 460ms    remaining: 922us
499:    learn: 0.0071622        total: 461ms    remaining: 0us

Evaluating the Model

To assess the model’s performance, we need to make predictions on the test set and compare them to the actual labels:

# Make predictions on the test data
y_pred = model.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy}")
Model Accuracy: 1.0

Visualizing the Results

Visualization is a powerful tool to comprehend your model’s performance. Let’s create a confusion matrix to visualize how well our model is doing:

from sklearn.metrics import confusion_matrix
import seaborn as sns

# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Visualize the confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
confusion matrix of Catboost in machine learning | Innovate Yourself

Feature Importance

CatBoost provides a straightforward way to determine feature importance, crucial for feature selection. Let’s visualize the importance of features in our model:

# Get feature importance
feature_importance = model.get_feature_importance(data=Pool(X_train, label=y_train), type='LossFunctionChange')

# Create a DataFrame to store feature names and their importance scores
feature_importance_df = pd.DataFrame({'Feature': X_train.columns, 'Importance': feature_importance})

# Sort the features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Plot feature importance
plt.figure(figsize=(10, 6))
plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'], color='skyblue')
plt.xlabel('Feature Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.show()
feature importance with Catboost in machine learning | Innovate Yourself

Hyperparameter Tuning

CatBoost offers various hyperparameters for fine-tuning the model. Here’s an example of tuning the learning rate and the number of iterations:

# Hyperparameter tuning
params = {
    'iterations': 1000,
    'learning_rate': 0.05,
    'depth': 6,
    'loss_function': 'MultiClass',
}

tuned_model = CatBoostClassifier(**params)
tuned_model.fit(X_train, y_train)

Conclusion

CatBoost is a formidable addition to your Python machine learning toolkit, promising great results with minimal effort. Its efficient handling of categorical features, out-of-the-box performance, and robust model accuracy make it a compelling choice for various projects.

To master CatBoost, practice is key. Experiment with different datasets, hyperparameters, and techniques to unlock its full potential. In your journey to Python machine learning excellence, CatBoost will be your trusty companion, ready to take on challenging real-world problems with you.

So, continue your exploration, fine-tuning, and experimentation with CatBoost. You’re on the path to becoming a Python machine learning pro, and CatBoost is your shortcut to success. Happy learning and coding!

This guide has taken you from the fundamentals to advanced techniques of CatBoost in Python 3, with practical examples and plots. It’s been quite a journey, and you’re well on your way to becoming a pro in the Python machine learning world. Keep the curiosity alive, keep experimenting, and you’ll achieve greatness in no time.

Also, check out our other playlist Rasa ChatbotInternet of thingsDockerPython ProgrammingMQTTTech NewsESP-IDF etc.
Become a member of our social family on youtube here.
Stay tuned and Happy Learning. ✌🏻😃
Happy coding! ❤️🔥

Leave a Reply