我有一个excel文件,其中A列(名称)和B列(描述)中有关于个人资料的详细说明。 看起来像:
Name Description
James R A good systems developer...
我正尝试计算“好”一词在“说明”列的每一行中出现的次数,并创建一个具有重复次数的新列。我有很多价值观,所以我更喜欢使用熊猫而不是Excel公式。 输出应如下所示:
Name Description Good
James R A good systems developer... 1
我开发的python代码是这样的:
In [1]: import collections
In [2]: import pandas as pd
In [3]: df=pd.read_excel('israel2013.xls')
In [4]: str1=df.description
In [5]: str2= 'good'
In [6]: for index, row in df.iterrows():
...: if str2 in str1:
...: counter=collections.Counter (r[0] for str2 in str1)
...: else:
...: print (0)
但是我从中得到全零,我也不知道怎么了。 谢谢
答案 0 :(得分:1)
演示数据框:
>>> data = [['James R', 'A good systems developer'], ['Bob C', 'a guy called Bob'], ['Alice R', 'Good teacher and a good runner']]
>>> df = pd.DataFrame(data, columns=['Name', 'Description'])
>>>
>>> df
Name Description
0 James R A good systems developer
1 Bob C a guy called Bob
2 Alice R Good teacher and a good runner
解决方案:
>>> df['Good'] = df.Description.str.count(r'(?i)\bgood\b')
>>> df
Name Description Good
0 James R A good systems developer 1
1 Bob C a guy called Bob 0
2 Alice R Good teacher and a good runner 2
\b
标记单词边界,(?i)
执行不区分大小写的搜索。除了使用(?i)
,您还可以import re
并提供flags=re.IGNORECASE
作为count
的第二个参数。
答案 1 :(得分:0)
尝试:
df['Good'] = df['description'].str.findall('good').str.len()