forked from INRIA/scikit-learn-mooc
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathtrees_ex_01.py
More file actions
89 lines (71 loc) · 2.58 KB
/
trees_ex_01.py
File metadata and controls
89 lines (71 loc) · 2.58 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# coding: utf-8
# %% [markdown]
# # 📝 Exercise M5.01
#
# In the previous notebook, we showed how a tree with a depth of 1 level was
# working. The aim of this exercise is to repeat part of the previous
# experiment for a depth with 2 levels to show how the process of partitioning
# is repeated over time.
#
# Before to start, we will:
#
# * load the dataset;
# * split the dataset into training and testing dataset;
# * define the function to show the classification decision function.
# %%
import pandas as pd
penguins = pd.read_csv("../datasets/penguins_classification.csv")
culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
target_column = "Species"
# %% [markdown]
# ```{note}
# If you want a deeper overview regarding this dataset, you can refer to the
# Appendix - Datasets description section at the end of this MOOC.
# ```
# %%
from sklearn.model_selection import train_test_split
data, target = penguins[culmen_columns], penguins[target_column]
data_train, data_test, target_train, target_test = train_test_split(
data, target, random_state=0
)
range_features = {
feature_name: (data[feature_name].min() - 1, data[feature_name].max() + 1)
for feature_name in data.columns
}
# %%
import numpy as np
import matplotlib.pyplot as plt
def plot_decision_function(fitted_classifier, range_features, ax=None):
"""Plot the boundary of the decision function of a classifier."""
from sklearn.preprocessing import LabelEncoder
feature_names = list(range_features.keys())
# create a grid to evaluate all possible samples
plot_step = 0.02
xx, yy = np.meshgrid(
np.arange(*range_features[feature_names[0]], plot_step),
np.arange(*range_features[feature_names[1]], plot_step),
)
# compute the associated prediction
Z = fitted_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = LabelEncoder().fit_transform(Z)
Z = Z.reshape(xx.shape)
# make the plot of the boundary and the data samples
if ax is None:
_, ax = plt.subplots()
ax.contourf(xx, yy, Z, alpha=0.4, cmap="RdBu")
return ax
# %% [markdown]
# Create a decision tree classifier with a maximum depth of 2 levels and fit
# the training data. Once this classifier trained, plot the data and the
# decision boundary to see the benefit of increasing the depth.
# %%
# Write your code here.
# %% [markdown]
# Did we make use of the feature "Culmen Length"?
# Plot the tree using the function `sklearn.tree.plot_tree` to find out!
# %%
# Write your code here.
# %% [markdown]
# Compute the accuracy of the decision tree on the testing data.
# %%
# Write your code here.