我有一个大的csv文件(超过66k行),我想计算字符串在每一行中出现的次数。我特别关注一列,该列中的每一行都有一个小句子,如下所示:
Example of data:
Sam ate an apple and she felt great
Jill thinks the sky is purple but Bob says it's blue
Ralph wants to go apple picking this fall
我知道如何为文本文件执行此操作,但我很难将相同的技术应用于csv。我一直在使用pandas并尝试了几种方法,但它们返回错误代码或空数据帧。
Attempts:
my_file = "NEISS2014.csv"
df = pandas.read_csv(my_file)
df.groupby(df['sentence'].map(lambda x:'apple' if 'apple' in x else x)).sum()
df[df['sentence'].str.contains("apple") == True]
如果有人可以帮我调试一下,我们将不胜感激!
答案 0 :(得分:2)
我认为您可以将str.count
与列sentence
一起使用:
print df
# sentence
#0 Sam ate an apple and she felt great apple apple
#1 Jill thinks the sky is purple but Bob says it'...
#2 Ralph wants to go apple picking this fall
print df.columns
#Index([u'sentence'], dtype='object')
df['count'] = df['sentence'].str.count('apple')
print df
# sentence count
#0 Sam ate an apple and she felt great apple apple 3
#1 Jill thinks the sky is purple but Bob says it'... 0
#2 Ralph wants to go apple picking this fall 1