Question

所以我有一个关键词列表，我试图检查我的csv表格中是否有任何这些单词，如果存在，则应该标记。我的代码完美无缺，除非该行有多个关键字，否则不会被标记。想法？

import sys
import csv
nk = ('aaa','bbb','ccc')
with open(sys.argv[1], "rb") as f:
    reader = csv.reader(f, delimiter = '\t')
    for row in reader:
        string=str(row)
        if any(word in string for word in nk):
            row.append('***')
            print '\t'.join(row)
        else:
            print '\t'.join(row)

提前致谢！

Answer 1

使用set intersection获取所有常用词：

nk = {'aaa','bbb','ccc'}
seen = set()             #keep as track of items seen so far in this set
with open(sys.argv[1], "rb") as f:
    ...
    for row in reader:
        #update `seen` with the items found common between `nk` and the current `row`
        seen.update(nk.intersection(row))
    ...

不要将row转换为字符串（string=str(row)），in运算符也适用于列表，其行为与字符串的in不同：

>>> strs = "['foo','abarc']"
>>> 'bar' in strs            #substring search
True
>>> lis = ['foo','abarc']    #item search
>>> 'bar' in lis
False

在字符串中查找关键字列表

1 个答案: