我有这个数据框df
:
AA_0 AA_1 AA_2 AA_3
store cake mass visit
store mass visit
mass store
store cake mass visit
我想计算每个序列AA_0
- AA_3
在df
中出现的次数,并按以下方式表示结果:
result =
count data
2 store/cake/mass/visit
1 store/mass/visit
1 mass/store
我该怎么做?
答案 0 :(得分:2)
您可以使用:
df['data'] = df.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
AA_0 AA_1 AA_2 AA_3 data
0 store cake mass visit store/cake/mass/visit
1 store mass visit NaN store/mass/visit
2 mass store NaN NaN mass/store
3 store cake mass visit store/cake/mass/visit
result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
count data
0 store/cake/mass/visit 2
1 store/mass/visit 1
2 mass/store 1
如果缺少数据是空格:
df['data'] = df.apply(lambda x: '/'.join(x), axis=1).str.strip('/ ')
print (df)
AA_0 AA_1 AA_2 AA_3 data
0 store cake mass visit store/cake/mass/visit
1 store mass visit store/mass/visit
2 mass store mass/store
3 store cake mass visit store/cake/mass/visit
result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
count data
0 store/cake/mass/visit 2
1 store/mass/visit 1
2 mass/store 1