熊猫系列最常见的元素

时间:2020-01-04 10:44:13

标签: python pandas

我有这个数据集:

Dataset

其中的烹饪国家/地区屡屡发生,我想输出的是每个国家/地区最受欢迎的5种食品原料清单。

到目前为止的代码:

import pandas as pd
from collections import Counter


filename="food.json"
food_dataset = pd.read_json(filename)

#getting seperate columns
country = food_dataset.loc[:,"country"]
ingredients = food_dataset.loc[:,"ingredients"]


Counter = Counter(ingredients) 

most_occur = Counter.most_common(3) 

print(most_occur)

1 个答案:

答案 0 :(得分:1)

使用DataFrame.explode由计数器创建的带有GroupBy.apply的熊猫0.25+ Series.value_counts和具有前5个索引的lambd函数的解决方案:

food_dataset = pd.DataFrame({'cuisine':['greek','southern_us'],
                             'ingredients':[list('andnsndnfndn'),
                                            list('ndnsndnfnsnd')]})
print (food_dataset)
       cuisine                           ingredients
0        greek  [a, n, d, n, s, n, d, n, f, n, d, n]
1  southern_us  [n, d, n, s, n, d, n, f, n, s, n, d]

N = 3
df = (food_dataset.explode("ingredients")
                  .groupby('cuisine')['ingredients']
                  .apply(lambda x: x.value_counts().index[:N].tolist())
                  .reset_index())
print (df)
       cuisine ingredients
0        greek   [n, d, a]
1  southern_us   [n, d, s]

替代解决方案:

food_dataset['top'] = (food_dataset['ingredients']
                          .apply(lambda x: [y[0] for y in Counter(x).most_common(N)]))
print (food_dataset)
       cuisine                           ingredients        top
0        greek  [a, n, d, n, s, n, d, n, f, n, d, n]  [n, d, a]
1  southern_us  [n, d, n, s, n, d, n, f, n, s, n, d]  [n, d, s]



df = (food_dataset.explode("ingredients")
                  .groupby('cuisine')['ingredients']
                  .apply(lambda x: [y[0] for y in Counter(x).most_common(N)])
                  .reset_index())
print (df)
       cuisine ingredients
0        greek   [n, d, a]
1  southern_us   [n, d, s]

如果cousine列中的每个值都是唯一的解决方案:

food_dataset['top'] = (food_dataset['ingredients']
                          .apply(lambda x: [y[0] for y in Counter(x).most_common(N)]))
print (food_dataset)
       cuisine                           ingredients        top
0        greek  [a, n, d, n, s, n, d, n, f, n, d, n]  [n, d, a]
1  southern_us  [n, d, n, s, n, d, n, f, n, s, n, d]  [n, d, s]