我必须遗漏一些非常明显的东西。
我有一个元组列表,它们是(短语,数字)对。我想从我的停用词列表中删除包含包含停用词的短语的整个元组。
stopwords = ['for', 'with', 'and', 'in', 'on', 'down']
tup_list = [('faucet', 5185), ('kitchen', 2719), ('faucets', 2628),
('kitchen faucet', 1511), ('shower', 1471), ('bathroom', 1131),
('handle', 1048), ('for', 1035), ('cheap', 960), ('bronze', 807),
('tub', 797), ('sale', 771), ('sink', 762), ('with', 696),
('single', 620), ('kitchen faucets', 615), ('stainless faucet', 613),
('pull', 603), ('and', 477), ('in', 447), ('single handle', 430),
('for sale', 406), ('bathroom faucet', 392), ('on', 369),
('down', 363), ('head', 359), ('pull down', 357), ('wall', 351),
('faucet with', 350)]
for p,n in tup_list:
print('p', p, p.split(), any(phrase in stopwords for phrase in p.split()))
print(len(tup_list))
for p,n in tup_list:
if any(phrase in stopwords for phrase in p.split()):
tup_list.remove((p,n))
print('Removing', p)
print(len(tup_list))
print([item for item in tup_list if item[0] == 'in'])
当我运行上述内容时,我得到以下打印输出:
p faucet ['faucet'] False
p kitchen ['kitchen'] False
p faucets ['faucets'] False
p kitchen faucet ['kitchen', 'faucet'] False
p shower ['shower'] False
p bathroom ['bathroom'] False
p handle ['handle'] False
p for ['for'] True
p cheap ['cheap'] False
p bronze ['bronze'] False
p tub ['tub'] False
p sale ['sale'] False
p sink ['sink'] False
p with ['with'] True
p single ['single'] False
p kitchen faucets ['kitchen', 'faucets'] False
p stainless faucet ['stainless', 'faucet'] False
p pull ['pull'] False
p and ['and'] True
p in ['in'] True
p single handle ['single', 'handle'] False
p for sale ['for', 'sale'] True
p bathroom faucet ['bathroom', 'faucet'] False
p on ['on'] True
p down ['down'] True
p head ['head'] False
p pull down ['pull', 'down'] True
p wall ['wall'] False
p faucet with ['faucet', 'with'] True
29
Removing for
Removing with
Removing and
Removing for sale
Removing on
Removing pull down
Removing faucet with
22
[('in', 447)]
我的问题:为什么不删除包含('in', 447)
的元组?打印输出显示p in ['in'] True
含义“in”位于停用词列表中,为什么tup_list.remove((p,n))
不会删除它?
答案 0 :(得分:0)
当您从列表中删除项目时,索引会更改。当您迭代更改的列表时,您将看到意外的结果。
这是一个解决方案。它不是最有效的,但可能适合您的需求。
remove_indices = []
for i, (p, n) in enumerate(tup_list):
if any(phrase in stopwords for phrase in p.split()):
remove_indices.append(i)
print('Removing', p)
tup_list = [i for j, i in enumerate(tup_list) if j not in remove_indices]