计算数据框中列表中文本的频率

时间:2019-05-31 11:06:28

标签: python pandas dataframe

我是python / pandas的新手,但有一个问题,我会合乎逻辑地提出以帮助我学习

我有一个称为方的数据框,其中包含以下数据

(index)    name                  invitees
0            birthday party     [mike, peter]
1            Retirement          [peter]
2            office opening     [simon, mike, peter]

我希望能够创建一个字典,该字典将显示被邀请者列中不同的名称以及频率,例如像这样

mike: 2, peter: 3, simon: 1

我试图在此处找到类似的内容,但我不太确定要使用的正确术语。

任何帮助将不胜感激 非常感谢

4 个答案:

答案 0 :(得分:0)

您可以使用集合中的this custom hook和itertools中的Counter来解决问题:

from collections import Counter
from itertools import chain

df2= pd.DataFrame({
    'name':["blah", "blah-blah", "waka-waka"],
    'invites':[['mike', 'peter'], ['peter', 'mike'], ['waka', 'peter', 'simon']]
})
Counter([elem for elem in chain.from_iterable(df2['invites'].values)])
  

Counter({'mike': 2, 'peter': 3, 'simon': 1, 'waka': 1})

答案 1 :(得分:0)

从数据框中收集名称,然后使用“计数器”:

from collections import Counter
import pandas as pd

# setup test data
data = {'invitees': [['mike', 'peter'], ['peter'], ['simon', 'mike', 'peter']]}
data = pd.DataFrame(data=data)

# select data series
names_lists = data['invitees']

# collect names
all_names = []
for item in names_lists:
    for name in item:
        all_names.append(name)

# count occurrence
summary = Counter(all_names)

输出:

{'peter': 3, 'mike': 2, 'simon': 1}

答案 2 :(得分:0)

from collections import Counter

invitees = [["mike", "peter"],["peter"],["simon", "mike", "peter"]]
name = ["birthday party","Retirement","office opening"]

new_df = pd.DataFrame(data={"name":name,"invitees":invitees})

all_invitees = []
for i,row in new_df.iterrows():
    invitees.extend(row[1])

invitees_count = dict(Counter(all_invities))

答案 3 :(得分:0)

just for fun

df['invitees'].apply(pd.Series).unstack().reset_index(name='n').drop('level_1', axis=1).dropna().groupby('n').count().to_dict()['level_0']

{'mike': 2, 'peter': 3, 'simon': 1}