반응형
import os
import pandas as pd
import numpy as np
import sklearn
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_text, export_graphviz
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
penguins = sns.load_dataset('penguins')
print (penguins.shape)
penguins.head()
penguins['bill_length_mm'].fillna(value = penguins['bill_length_mm'].mean(), inplace=True)
penguins ['bill_depth_mm'].fillna(value = penguins['bill_depth_mm'].mean(), inplace=True)
penguins ['flipper_length_mm'].fillna(value = penguins['flipper_length_mm'].mean(), inplace=True)
penguins ['body_mass_g'].fillna(value = penguins['body_mass_g'].mean(), inplace=True)
penguins ['sex'] = penguins['sex'].apply(lambda x: 1 if x == 'MALE' else 0)
penguins ['Biscoe'] = penguins['island' ].apply(lambda x: 1 if x == 'Biscoe' else 0)
penguins ['Dream'] = penguins['island'].apply(lambda x: 1 if x == 'Dream' else 0)
colnames = ['bill_length_mm', 'bill_depth_mm','flipper_length_mm', 'body_mass_g', 'sex',
'Biscoe', 'Dream']
X = penguins [colnames]
y = penguins.iloc[:,0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=123)
pen_tree = DecisionTreeClassifier(criterion = 'gini', max_depth = 3,random_state =1).fit(X_train, y_train)
fig, axes = plt.subplots(nrows = 1, ncols = 1, figsize = (10,5), dpi = 300)
plotResult = sklearn.tree.plot_tree(pen_tree,
feature_names = colnames,
filled = True)
print(pen_tree.score(X_test,y_test))
pred_y = pen_tree.predict(X_test)
print(confusion_matrix(y_test, pred_y))
반응형
'Data Science' 카테고리의 다른 글
감성분석 AttributeError: 'CountVectorizer' object has no attribute 'get_feature_name (0) | 2023.12.20 |
---|---|
오일석 파이썬으로 만드는 인공지능 부록 (0) | 2023.12.18 |
텐서플로 활용 MNIST 필기숫자 인식을 위한 CNN 프로그램 (1) | 2023.12.01 |
데이터 분석가가 반드시 알아야 할 모든 것(황세웅, 위키북스) (1) | 2023.11.27 |
머신러닝 딥러닝 캐글 문제해결 체크리스트 (0) | 2023.11.13 |