预处理数据并查看预处理前后的结果(报告为准确性)
绘制以下图表:
关联图热图图
缺失值热图图表
购买国家(地区),购买年龄(地区)和薪金(地区)的折线图/散点图
Country Age Salary Purchased
France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40 Yes
France 35 58000 Yes
Spain 52000 No
France 48 79000 Yes
Germany 50 83000 No
France 37 Yes
France 18888 No
Spain 17 67890 Yes
Germany 12000 No
Spain 38 98888 No
Germany 50 Yes
France 35 58000 Yes
Spain 12345 No
France 23 Yes
Germany 55 78456 No
France 43215 Yes
答案 0 :(得分:0)
有时很难从“国家/地区”与“购买”之类的散点图中了解。您的清单中的三个国家/地区已过时。进行热图here
可能会有所帮助import pandas as pd
from matplotlib import pyplot as plt
#read csv using panda
df = pd.read_csv('Data.csv')
copydf = df
#before data preprocessing
print(copydf)
#fill nan value with average of age and salary
df['Age'] = df['Age'].fillna(df['Age'].mean(axis=0))
df['Salary '] = df['Salary'].fillna(df['Salary'].mean(axis=0))
#after data preprocessing
print(df)
plt.figure(1)
# Country Vs Purchased
plt.subplot(221)
plt.scatter(df['Country'], df['Purchased'])
plt.title('Country vs Purchased')
plt.grid(True)
# Age Vs Purchased
plt.subplot(222)
plt.scatter(df['Age'], df['Purchased'])
plt.title('Age vs Purchased')
plt.grid(True)
# Salary Vs Purchased
plt.subplot(223)
plt.scatter(df['Salary'], df['Purchased'])
plt.title('Salary vs Purchased')
plt.grid(True)
plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.75,
wspace=0.5)
plt.show()