Question

所以我基本上有一个pandas数据框：说

1. oshin oshin1 oshin2

2. oshin3 oshin2 oshin4

我想以这种方式得到一个计数器（基本上我的输出）应该是：

oshin:1 oshin1:1 oshin2:2 oshin3:1 oshin4:1

这样我可以将输出导出到csv文件，因为它会很长。我怎么在熊猫中做到这一点？要么我怎么能为熊猫中的任何一栏做这件事。

Answer 1

我认为您需要先在apply和split的每列中创建lists，然后按values转换为numpy数组，并按numpy.ravel展开。转换为list并应用Counter，最后转换为dict：

print (df)
                    col
0   oshin oshin1 oshin2
1  oshin3 oshin2 oshin4

from collections import Counter

cols = ['col', ...]
d = dict(Counter(np.concatenate(df[cols].apply(lambda x : x.str.split()) \
                                        .values.ravel().tolist())))
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}

但如果只有一列（感谢Jon Clements）：

d = dict(df['col'].str.split().map(Counter).sum())
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}

编辑：

来自John Galt的另一个更快的解决方案，谢谢：

d = pd.Series(' '.join(df['col']).split()).value_counts().to_dict()
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}

从熊猫数据框中获取单词列表的计数，其中每列是单词列表

1 个答案: