Question

我有一个大的csv文件（超过66k行），我想计算字符串在每一行中出现的次数。我特别关注一列，该列中的每一行都有一个小句子，如下所示：

Example of data:
Sam ate an apple and she felt great
Jill thinks the sky is purple but Bob says it's blue
Ralph wants to go apple picking this fall

我知道如何为文本文件执行此操作，但我很难将相同的技术应用于csv。我一直在使用pandas并尝试了几种方法，但它们返回错误代码或空数据帧。

Attempts:
my_file = "NEISS2014.csv"
df = pandas.read_csv(my_file)

df.groupby(df['sentence'].map(lambda x:'apple' if 'apple' in x else x)).sum()
df[df['sentence'].str.contains("apple") == True]

如果有人可以帮我调试一下，我们将不胜感激！

Answer 1

我认为您可以将str.count与列sentence一起使用：

print df
#                                            sentence
#0    Sam ate an apple and she felt great apple apple
#1  Jill thinks the sky is purple but Bob says it'...
#2          Ralph wants to go apple picking this fall

print df.columns
#Index([u'sentence'], dtype='object')

df['count'] = df['sentence'].str.count('apple')
print df
#                                            sentence  count
#0    Sam ate an apple and she felt great apple apple      3
#1  Jill thinks the sky is purple but Bob says it'...      0
#2          Ralph wants to go apple picking this fall      1

计算csv文件列中字符串的出现次数

1 个答案: