我有2个csv文件,dictionary.csv和story.csv。我想计算story.csv中每行有多少个单词与dictionary.csv中的单词匹配
下面是被截断的示例
Story.csv
id STORY
0 Jennie have 2 shoes, a red heels and a blue sneakers
1 The skies are pretty today
2 One of aesthetic color is grey
Dictionary.csv
red
green
grey
blue
black
我期望的输出是
output.csv
id STORY Found
0 Jennie have 2 shoes, a red heels and a blue sneakers 2
1 The skies are pretty today 0
2 One of aesthetic color is grey 1
这些是我到目前为止拥有的代码,但是我只有NaN(空单元格)
import pandas as pd
import csv
news=pd.read_csv("Story.csv")
dictionary=pd.read_csv("Dictionary.csv")
news['STORY'].value_counts()
news['How many found in 1'] = dictionary['Lists'].map(news['STORY'].value_counts())
news.to_csv("output.csv")
我也尝试使用.str.count,但是我一直保持为零
答案 0 :(得分:1)
尝试一下
import pandas as pd
#create the sample data frame
data = {'id':[0,1,2],'STORY':['Jennie have 2 shoes, a red heels and a blue sneakers',\
'The skies are pretty today',\
'One of aesthetic color is grey']}
word_list = ['red', 'green', 'grey', 'blue', 'black']
df = pd.DataFrame(data)
#start counting
df['Found'] = df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}).sum())
#alternatively, can use this
#df['Found'] = df['STORY'].astype(str).apply(lambda t: sum([t.count(word) for word in word_list]))
输出
df
# id STORY Found
#0 0 Jennie have 2 shoes, a red heels and a blue sneakers 2
#1 1 The skies are pretty today 0
#2 2 One of aesthetic color is grey 1
奖金编辑:如果您想查看按字数统计的详细细分,请运行
df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}))
# red green grey blue black
#0 1 0 0 1 0
#1 0 0 0 0 0
#2 0 0 1 0 0