我有df
这样,
Owner Messages
AAA (YY) Duplicates
AAA Missing Number; (VV) Corrected Value; (YY) Duplicates
AAA (YY) Duplicates
BBB (YY) Duplicates
BBB Missing Measure; Missing Number
当我这样做正常的groupby
时,
df_grouped = df.groupby([' Owner', 'Messages']).size().reset_index(name='count')
df_grouped
我按预期得到了这个,
Owner Messages count
0 AAA (YY) Duplicates 2
1 AAA Missing Number; (VV) Corrected Value; (YY) Duplicates 1
2 BBB (YY) Duplicates 1
3 BBB Missing Measure; Missing Number 1
但是,我需要;
列Messages
内的某些内容(所需的输出)。
Owner Messages count
0 AAA (YY) Duplicates 3
1 AAA Missing Number 1
2 AAA (VV) Corrected Value 1
3 BBB (YY) Duplicates 1
4 BBB Missing Measure 1
5 BBB Missing Number 1
到目前为止,基于post,@ LeoRochael的回答,它将Messages
列的值拆分为;
并放入列表中。无论如何,分裂后我无法得到个人数。
如何获得我想要的输出?
答案 0 :(得分:5)
您需要取消原始数据框,然后我们只需执行组size
s=df.set_index('Owner').Messages.str.split('; ',expand=True).stack().to_frame('Messages').reset_index()
s.groupby(['Owner','Messages']).size()
Out[1213]:
Owner Messages
AAA (VV) Corrected Value 1
(YY) Duplicates 3
Missing Number 1
BBB (YY) Duplicates 1
Missing Measure 1
Missing Number 1
dtype: int64
答案 1 :(得分:2)
from collections import Counter
import pandas as pd
pd.Series(
Counter([(o, m) for o, M in df.values for m in M.split('; ')])
).rename_axis(['Owner', 'Message']).reset_index(name='Count')
Owner Message Count
0 AAA (VV) Corrected Value 1
1 AAA (YY) Duplicates 3
2 AAA Missing Number 1
3 BBB (YY) Duplicates 1
4 BBB Missing Measure 1
5 BBB Missing Number 1