我有成对的分类数据,但我不想重复计算“玩具”和“B”例如多次在一起的实例。 我可以使用计数做一个数据透视表,但我想要的是相当于1或0,具体取决于是否匹配2个值的组合,而不是匹配的数量,2,3,4等。
以下是输入示例:
RS232,1.8,focused,C
RS233,2.8,chew,E
RS234,3.8,toy,D
RS235,4.8,poodle,C
RS236,5.8,winding,E
RS237,6.8,up,D
RS238,7.8,focused,B
RS239,9.8,chew,B
RS240,7.8,toy,B
RS241,6.8,toy,B
RS242,5.8,toy,A
RS243,4.8,focused,A
RS244,9.8,chew,A
RS245,8.8,chew,A
RS246,7.8,chew,C
RS247,6.8,winding,C
RS248,5.8,winding,C
RS249,4.8,winding,D
RS250,3.8,toy,D
除了早期的过滤步骤之外,数字字段无关紧要。但是我只想把RS244和RS245计算在条形图中作为单个计数,因为使这个组合两次只是意味着人们尝试了很多,而不是多次出现有任何特殊含义。
我最终得到了我绘制的数据:
attrib2 group count
0 chew A 2
1 chew B 1
2 chew C 1
3 chew E 1
4 focused A 1
5 focused B 1
6 focused C 1
7 poodle C 1
8 toy A 1
9 toy B 2
10 toy D 2
11 up D 1
12 winding C 2
13 winding D 1
14 winding E 1
注意重复对的计数> 1,但是为了绘图,我使用.value_counts,所以我忽略了count字段,只是绘制了attrib2的每个元素配对的UNIQUE项目的数量。我想要的直方图只是每个元素在上面的attrib2列中列出的次数。
我这样做的粗暴方式是 - 当然必须有一种更清洁,更加抒情的方法来实现这一目标吗?
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import interactive
df= pd.read_csv('out.txt',sep=',',engine='c',lineterminator='\n',header='infer')
# # I am getting group/attrib2 pairs, but I want my plot to be against attrib2
groupout3 = df.groupby(['attrib2']).group.value_counts().sort_index()
# # groupby gives multiple counts for same combination, so set to 1 or leave as 0
# # following line not needed since I use value_counts below so it counts 1 if there is something there, regardless of the value, so 1, 2, etc. all get counted as 1 and 0 is 0
# #groupout3[groupout3 != 0 ] = 1
# #convert back to DataFrame for plotting
dfgroup = groupout3.to_frame('count')
# #make index back to column name
dfgroup.reset_index(level=['group','attrib2'], inplace=True)
# #plot categorical data counting
plt.figure(); dfgroup.attrib2.value_counts().plot(kind='bar')
plt.show()
肯定有更优雅的方式来做到这一点?
谢谢!
答案 0 :(得分:1)
IIUC你可以这样做:
(df.groupby(['attrib2','group'])
.size()
.reset_index()
.groupby('attrib2')
.size()
.plot.bar(rot=0)
)
数据:
In [85]: df
Out[85]:
attrib num attrib2 group
0 RS232 1.8 focused C
1 RS233 2.8 chew E
2 RS234 3.8 toy D
3 RS235 4.8 poodle C
4 RS236 5.8 winding E
5 RS237 6.8 up D
6 RS238 7.8 focused B
7 RS239 9.8 chew B
8 RS240 7.8 toy B
9 RS241 6.8 toy B
10 RS242 5.8 toy A
11 RS243 4.8 focused A
12 RS244 9.8 chew A
13 RS245 8.8 chew A
14 RS246 7.8 chew C
15 RS247 6.8 winding C
16 RS248 5.8 winding C
17 RS249 4.8 winding D
18 RS250 3.8 toy D