通过pandas按特殊字符和组拆分列的值

时间:2018-04-19 14:20:32

标签: python-3.x pandas split group-by

我有df这样,

Owner   Messages
AAA     (YY) Duplicates
AAA     Missing Number; (VV) Corrected Value; (YY) Duplicates
AAA     (YY) Duplicates
BBB     (YY) Duplicates
BBB     Missing Measure; Missing Number

当我这样做正常的groupby时,

df_grouped = df.groupby([' Owner', 'Messages']).size().reset_index(name='count')
df_grouped

我按预期得到了这个,

    Owner  Messages                                               count
0   AAA   (YY) Duplicates                                           2
1   AAA   Missing Number; (VV) Corrected Value; (YY) Duplicates     1
2   BBB   (YY) Duplicates                                           1
3   BBB   Missing Measure; Missing Number                           1

但是,我需要;Messages内的某些内容(所需的输出)。

   Owner    Messages             count
0   AAA    (YY) Duplicates       3
1   AAA    Missing Number        1
2   AAA    (VV) Corrected Value  1
3   BBB    (YY) Duplicates       1
4   BBB    Missing Measure       1
5   BBB    Missing Number        1

到目前为止,基于post,@ LeoRochael的回答,它将Messages列的值拆分为;并放入列表中。无论如何,分裂后我无法得到个人数。

如何获得我想要的输出?

2 个答案:

答案 0 :(得分:5)

您需要取消原始数据框,然后我们只需执行组size

s=df.set_index('Owner').Messages.str.split('; ',expand=True).stack().to_frame('Messages').reset_index()
s.groupby(['Owner','Messages']).size()
Out[1213]: 
Owner  Messages            
AAA    (VV) Corrected Value    1
       (YY) Duplicates         3
       Missing Number          1
BBB    (YY) Duplicates         1
       Missing Measure         1
       Missing Number          1
dtype: int64

答案 1 :(得分:2)

from collections import Counter
import pandas as pd

pd.Series(
    Counter([(o, m) for o, M in df.values for m in M.split('; ')])
).rename_axis(['Owner', 'Message']).reset_index(name='Count')

  Owner               Message  Count
0   AAA  (VV) Corrected Value      1
1   AAA       (YY) Duplicates      3
2   AAA        Missing Number      1
3   BBB       (YY) Duplicates      1
4   BBB       Missing Measure      1
5   BBB        Missing Number      1