计算csv文件列中字符串的出现次数

时间:2016-04-28 05:28:35

标签: python string csv pandas

我有一个大的csv文件(超过66k行),我想计算字符串在每一行中出现的次数。我特别关注一列,该列中的每一行都有一个小句子,如下所示:

Example of data:
Sam ate an apple and she felt great
Jill thinks the sky is purple but Bob says it's blue
Ralph wants to go apple picking this fall

我知道如何为文本文件执行此操作,但我很难将相同的技术应用于csv。我一直在使用pandas并尝试了几种方法,但它们返回错误代码或空数据帧。

Attempts:
my_file = "NEISS2014.csv"
df = pandas.read_csv(my_file)

df.groupby(df['sentence'].map(lambda x:'apple' if 'apple' in x else x)).sum()
df[df['sentence'].str.contains("apple") == True]

如果有人可以帮我调试一下,我们将不胜感激!

1 个答案:

答案 0 :(得分:2)

我认为您可以将str.count与列sentence一起使用:

print df
#                                            sentence
#0    Sam ate an apple and she felt great apple apple
#1  Jill thinks the sky is purple but Bob says it'...
#2          Ralph wants to go apple picking this fall

print df.columns
#Index([u'sentence'], dtype='object')

df['count'] = df['sentence'].str.count('apple')
print df
#                                            sentence  count
#0    Sam ate an apple and she felt great apple apple      3
#1  Jill thinks the sky is purple but Bob says it'...      0
#2          Ralph wants to go apple picking this fall      1