如何计算一系列列出现在数据框中的次数?

时间:2017-02-14 12:59:12

标签: python pandas

我有这个数据框df

AA_0    AA_1     AA_2     AA_3
store   cake     mass     visit    
store   mass     visit
mass    store
store   cake     mass     visit

我想计算每个序列AA_0 - AA_3df中出现的次数,并按以下方式表示结果:

result = 

    count   data
    2       store/cake/mass/visit
    1       store/mass/visit
    1       mass/store

我该怎么做?

1 个答案:

答案 0 :(得分:2)

您可以使用:

df['data'] = df.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
    AA_0   AA_1   AA_2   AA_3                   data
0  store   cake   mass  visit  store/cake/mass/visit
1  store   mass  visit    NaN       store/mass/visit
2   mass  store    NaN    NaN             mass/store
3  store   cake   mass  visit  store/cake/mass/visit

result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
                   count  data
0  store/cake/mass/visit     2
1       store/mass/visit     1
2             mass/store     1

如果缺少数据是空格:

df['data'] = df.apply(lambda x: '/'.join(x), axis=1).str.strip('/ ')
print (df)
    AA_0   AA_1   AA_2   AA_3                   data
0  store   cake   mass  visit  store/cake/mass/visit
1  store   mass  visit              store/mass/visit
2   mass  store                           mass/store
3  store   cake   mass  visit  store/cake/mass/visit

result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
                   count  data
0  store/cake/mass/visit     2
1       store/mass/visit     1
2             mass/store     1