数据可视化与机器学习

时间:2019-05-08 06:18:24

标签: machine-learning data-visualization

预处理数据并查看预处理前后的结果(报告为准确性)

绘制以下图表:

  • 关联图热图图

  • 缺失值热图图表

  • 购买国家(地区),购买年龄(地区)和薪金(地区)的折线图/散点图

Country Age Salary  Purchased
France  44  72000   No
Spain   27  48000   Yes
Germany 30  54000   No
Spain   38  61000   No
Germany 40          Yes
France  35  58000   Yes
Spain       52000   No
France  48  79000   Yes
Germany 50  83000   No
France  37          Yes
France      18888   No
Spain   17  67890   Yes
Germany     12000   No
Spain   38  98888   No
Germany 50          Yes
France  35  58000   Yes
Spain       12345   No
France  23          Yes
Germany 55  78456   No
France      43215   Yes

1 个答案:

答案 0 :(得分:0)

有时很难从“国家/地区”与“购买”之类的散点图中了解。您的清单中的三个国家/地区已过时。进行热图here

可能会有所帮助
import pandas as pd
from matplotlib import pyplot as plt

#read csv using panda

df = pd.read_csv('Data.csv')
copydf = df

#before data preprocessing
print(copydf)

#fill nan value with average of age and salary

df['Age'] = df['Age'].fillna(df['Age'].mean(axis=0))
df['Salary '] = df['Salary'].fillna(df['Salary'].mean(axis=0))

#after data preprocessing
print(df)

plt.figure(1)

#  Country Vs Purchased
plt.subplot(221)
plt.scatter(df['Country'], df['Purchased'])
plt.title('Country vs Purchased')
plt.grid(True)


#  Age Vs Purchased
plt.subplot(222)
plt.scatter(df['Age'], df['Purchased'])
plt.title('Age vs Purchased')
plt.grid(True)


# Salary Vs Purchased
plt.subplot(223)
plt.scatter(df['Salary'], df['Purchased'])
plt.title('Salary vs Purchased')
plt.grid(True)

plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.75,
                    wspace=0.5)

plt.show()
相关问题