我有一个如下所示的Excel文件:
CompanyName High Priority QualityIssue
Customer1 Yes User
Customer1 Yes User
Customer2 No User
Customer3 No Equipment
Customer1 No Neither
Customer3 No User
Customer3 Yes User
Customer3 Yes Equipment
Customer4 No User
我想计算每个CompanyName
类型QualityIssue
中每个实例的显示时间,并按外观下降的数量排序。
例如,使用此代码:
df.groupby(["CompanyName ", "QualityIssue"]).size().to_frame('Count')
我明白了:
Out:
CompanyName QualityIssue Count
Customer2 User 1
Customer1 Neither 1
Customer4 User 1
Customer1 User 2
Customer3 Equipment 2
Customer3 User 2
然后让我说我在内存中有另外一份上述内容。
我想要的是将第二个查询的最后一列添加到第一个查询的末尾(实际上它不是它的副本,它只是一个例子):
CompanyName QualityIssue Count1 Count2
Customer2 User 1 1
Customer1 Neither 1 1
Customer4 User 1 1
Customer1 User 2 2
Customer3 Equipment 2 2
Customer3 User 2 2
这里的问题是,如果我这样做
df['Count']
它不会只打印该列,它会打印所有内容,就像执行
一样print df
所以我找不到一种方法来只获取dataFrame的最后一列将它添加到另一列。
答案 0 :(得分:1)
使用groupby
和size
df.groupby(['CompanyName', 'QualityIssue']).size()
CompanyName QualityIssue
Customer1 Neither 1
User 2
Customer2 User 1
Customer3 Equipment 2
User 2
Customer4 User 1
dtype: int64
假设我们在记忆中有另一个
c1 = df.groupby(['CompanyName', 'QualityIssue']).size()
c2 = c1.copy()
然后使用pd.concat
pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0)
Count1 Count2
CompanyName QualityIssue
Customer1 Neither 1 1
User 2 2
Customer2 User 1 1
Customer3 Equipment 2 2
User 2 2
Customer4 User 1 1
reset_index
如果您希望索引在数据框中正确回归。
pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0) \
.reset_index()
CompanyName QualityIssue Count1 Count2
0 Customer1 Neither 1 1
1 Customer1 User 2 2
2 Customer2 User 1 1
3 Customer3 Equipment 2 2
4 Customer3 User 2 2
5 Customer4 User 1 1