使用matplotlib绘制Char数据的Pandas DataFrame

时间:2017-12-19 17:20:04

标签: python pandas matplotlib

数据集中的数据纯粹由字符组成。例如:

p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g

可以在agaricus-lepiota.data in the uci machine learning datasets mushroom dataset

中找到完整的数据副本

是否有通过matplotlib使用char数据(而不是必须将数据集转换为数字)的可视化方法?

只是为了任何形式的可视化,即:

filename = 'mushrooms.csv'
df_mushrooms = pd.read_csv(filename, names = ["Classes", "Cap-Shape", "Cap-Surface", "Cap-Colour", "Bruises", "Odor", "Gill-Attachment", "Gill-Spacing", "Gill-Size", "Gill-Colour", "Stalk-Shape", "Stalk-Root", "Stalk-Surface-Above-Ring", "Stalk-Surface-Below-Ring", "Stalk-Colour-Above-Ring", "Stalk-Colour-Below-Ring", "Veil-Type", "Veil-Colour", "Ring-Number", "Ring-Type", "Spore-Print-Colour", "Population", "Habitat"])


#If there are any entires (rows) with any missing values/NaN's drop the row.
df_mushrooms.dropna(axis = 0, how = 'any', inplace = True)

df_mushrooms.plot.scatter(x = 'Classes', y = 'Cap-Shape')

1 个答案:

答案 0 :(得分:1)

可以这样做,但是从图形的角度来看,这种方法并没有任何意义。如果你按照你的要求去做,它会是这样的:

enter image description here

我知道我不应该告诉某人如何展示他们的图表,但这并没有向我传达任何信息。问题是,对ClassesCap-Shape索引使用xy字段始终会将相同的字母放在同一位置。没有变化。也许还有一些其他字段可以用作索引,然后使用Cap-Shape作为标记,但因为它不会添加任何值。我个人也是这样。

要使用字符串作为标记,您可以使用matplotlib.markers中描述的“$ ... $”标记,但我必须再次提供警告,这样的图形比传统方法慢得多必须迭代数据帧的行。

fig, ax = plt.subplots()
# Classes only has 'p' and 'e' as unique values so we will map them as 1 and 2 on the index
df['Class_Id'] = df.Classes.map(lambda x: 1 if x == 'p' else 2)
df['Cap_Val'] = df['Cap-Shape'].map(lambda x: ord(x) - 96)
for idx, row in df.iterrows():
    ax.scatter(x=row.Class_Id, y=row.Cap_Val, marker=r"$ {} $".format(row['Cap-Shape']), c=plt.cm.nipy_spectral(row.Cap_Val / 26))
ax.set_xticks([0,1,2,3])
ax.set_xticklabels(['', 'p', 'e', ''])
ax.set_yticklabels(['', 'e', 'j', 'o', 't', 'y'])
fig.show()