所以我有一个关键词列表,我试图检查我的csv表格中是否有任何这些单词,如果存在,则应该标记。我的代码完美无缺,除非该行有多个关键字,否则不会被标记。想法?
import sys
import csv
nk = ('aaa','bbb','ccc')
with open(sys.argv[1], "rb") as f:
reader = csv.reader(f, delimiter = '\t')
for row in reader:
string=str(row)
if any(word in string for word in nk):
row.append('***')
print '\t'.join(row)
else:
print '\t'.join(row)
提前致谢!
答案 0 :(得分:0)
使用set intersection获取所有常用词:
nk = {'aaa','bbb','ccc'}
seen = set() #keep as track of items seen so far in this set
with open(sys.argv[1], "rb") as f:
...
for row in reader:
#update `seen` with the items found common between `nk` and the current `row`
seen.update(nk.intersection(row))
...
不要将row
转换为字符串(string=str(row)
),in
运算符也适用于列表,其行为与字符串的in
不同:
>>> strs = "['foo','abarc']"
>>> 'bar' in strs #substring search
True
>>> lis = ['foo','abarc'] #item search
>>> 'bar' in lis
False