데이터 분석/ADP 자격증 공부

[ADP 실기] (시각화) 파이썬 그래프 그리기(Boxplot, Scatter, Pairplot, Hitmap 등)

나르시스트 2026. 4. 5. 17:26

*막대그래프 (명목척도-빈도수)

import matplotlib.pyplot as plt

wine_type = wine['Class'].value_counts()
plt.bar(wine_type.index, wine_type.values, width = 0.8, bottom = None, align = 'center')
plt.show()

import seaborn as sns
import matplotlib.pyplot as plt

titanic = sns.load_dataset("titanic")

plt.figure(figsize=(8, 6))
sns.countplot(x='sex', hue='class', data=titanic)
plt.title('Number of Passengers by Gender and Class')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.show()

*히스토그램 (연속형-빈도수)

import matplotlib.pyplot as plt

plt.title('Wine alcohol histogram')
plt.hist('alcohol', bins=8, range=(11, 15), color='purple', data = wine)
plt.show()

*Box Plot (상자 그림) – 연속형(기술통계)

plt.boxplot(iris.drop(columns='class'))   # 혹은 iris[['sepal width (cm)', 'class']].boxplot(by='class')
plt.show()

*Scatter Plot (산점도) (X-Y)

plt.scatter(x = iris['sepal length (cm)'], y = iris['sepal width (cm)'], alpha=0.5)
plt.show()

plt.scatterplot(x = iris['sepal length (cm)'], y = iris['sepal width (cm)'], data=iris, hue='class', style='class')
plt.show()

*수평선, 수직선 그리기

plt.hlines(-6, -10, 10, color='grey')
plt.vlines(-6, -10, 10, color='red', linestyles='dashed')

*함수식 그래프

def linear_func(x):
    return 2*x + 1

X = iris['sepal length (cm)']
plt.plot(X, linear_func(X), c='#789395')
plt.show()

*회귀선 그래프
※ 2차 이상의 그래프를 그리는 경우, 데이터를 X값에 대하여 정렬할 필요가 있음

iris2 = iris.sort_values(by = 'sepal length (cm)')
X, Y = iris2['sepal length (cm)'], iris2['petal length (cm)']
b2, b1, b0 = np.polyfit(X, Y, 2)
plt.scatter(x = X, y = Y, alpha=0.5)
plt.plot(X, b0 + b1*X + b2*X**2, color='red')
plt.show()

*상관관계 시각화

import pandas as pd
from sklearn.datasets import load_iris
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt

iris_data = load_iris()
iris = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)

scatter_matrix(iris, alpha=0.5, figsize=(8, 8), diagonal='hist')
plt.show()

import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

sns.pairplot(df, hue='target', diag_kind='kde')  # hue='Class', diag_kind='kde'
plt.title('Scatter Plot by Class')
plt.show()

*상관계수 행렬 그래프

import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

plt.figure(figsize=(5, 4))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

*Pandas Profiling

import pandas as pd
from sklearn.datasets import load_iris
from pandas_profiling import ProfileReport

iris = load_iris()
iris = pd.DataFrame(iris.data, columns=iris.feature_names)
iris['Class'] = load_iris().target
iris['Class'] = iris['Class'].map({0: 'Setosa', 1:'Versicolour', 2:'Virginica'})

ProfileReport(iris)

→ 편한 기능 같은데… 패키지가 바뀌어 수정이 필요하다

 

*2×2 그래프 그리기

from matplotlib import pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
sns.scatterplot(x[:,1], x[:,2], hue=y, ax=axes[0][0], alpha=0.5)
sns.scatterplot(x_under[:,1], x_under[:,2], hue=y_under, ax=axes[0][1], alpha=0.5)
sns.scatterplot(x_over[:,1], x_over[:,2], hue=y_over, ax=axes[1][0], alpha=0.5)
sns.scatterplot(x_sm[:,1], x_sm[:,2], hue=y_sm, ax=axes[1][1], alpha=0.5)

axes[0][0].set_title('Original Data')
...

plt.show()