我的列表:
city=['Venango Municiplaity', 'Waterford ship','New York']
预期结果:
city = ['Venango Municiplaity ', 'Waterford ship','New York','Venango','Waterford']
常用词:
common_words = ['ship','municipality']
扫描“我的列表”中的所有项目,并去除常用词,然后将其重新插入到“预期结果”所示的同一列表中。
我能够搜索包含常用词的项目,但不确定如何将其替换为空白并重新插入“我的列表”中。
到目前为止,我的代码:
for item in city:
if(any(x in s.lower() for s in item.split(' ') for x in common_words)) :
答案 0 :(得分:8)
我编写了一个可以按预期工作的小代码:
city=['Venango Municiplaity', 'Waterford ship','New York']
comwo = ['ship','municipality']
for i, c in enumerate(city):
for ii in comwo:
if ii in c:
city.append(city[i].replace(ii,""))
print(city)
输出:
['Venango Municiplaity', 'Waterford ship', 'New York', 'Waterford ']
您创建的列表包含不正确的拼写。
查看列表city
的第一个元素Venango
Municiplaity
和common_words的第二个元素 municipality
因此,如果您也想替换单词后面的空格(如果有的话),那么我做了一个单独的代码:
city=['Village home', 'Villagehome','New York']
comwo = ['home']
for i, c in enumerate(city):
for ii in comwo:
if ii in c:
city.append(city[i].replace(" "+ii,"")) if city[i].replace(" "+ii,"") != city[i] else city.append(city[i].replace(ii,""))
print(city)
输出:
['Village home', 'Villagehome', 'New York', 'Village', 'Village']
答案 1 :(得分:7)
我建议您采用以下解决方案,将re.sub
与flags=re.IGNORECASE
结合使用,以除去忽略大小写的常见单词:
import re
city = ['Venango Municipality', 'Waterford ship','New York']
common_words = ['ship','municipality']
toAppend = []
for c in city:
for cw in common_words:
if cw.lower() in c.lower().split():
toAppend.append(re.sub(cw, "", c, flags=re.IGNORECASE).strip())
city += toAppend
print(city) # ['Venango Municipality', 'Waterford ship', 'New York', 'Venango', 'Waterford']
这是使用列表理解的单线样式解决方案,虽然简短但可读性却很差:
import re
city = ['Venango Municipality', 'Waterford ship','New York']
common_words = ['ship','municipality']
city += [re.sub(cw, "", c, flags=re.IGNORECASE).strip() for c in city for cw in common_words if cw.lower() in c.lower().split()]
print(city) # ['Venango Municipality', 'Waterford ship', 'New York', 'Venango', 'Waterford']
答案 2 :(得分:6)
您可以尝试一下,创建新列表以保存数据,然后应将数据添加到原始列表中,然后合并结果:
In [1]: city=['Venango Municiplaity', 'Waterford ship','New York']
In [2]: common_words = ['ship', 'municiplaity']
In [3]: list_add = []
In [4]: for item in city:
...: item_words = [s.lower() for s in item.split(' ')]
...: if set(common_words) & set(item_words):
...: new_item = [s for s in item.split(' ') if s.lower() not in common_words]
...: list_add.append(" ".join(new_item))
...:
In [5]: city + list_add
Out[5]: ['Venango Municiplaity', 'Waterford ship', 'New York', 'Venango', 'Waterford']
答案 3 :(得分:4)
这是使用正则表达式的一种方法。
演示:
import re
city=['Venango Municiplaity', 'Waterford ship','New York']
common_words = ['ship','municiplaity']
common_words = "(" + "|".join(common_words) + ")"
res = []
for i in city:
if re.search(common_words, i, flags=re.IGNORECASE):
res.append(i.strip().split()[0])
print(city + res)
输出:
['Venango Municiplaity', 'Waterford ship', 'New York', 'Venango', 'Waterford']
答案 4 :(得分:4)
您可以使用列表理解来检测某项是否包含要添加到city
列表中的内容。
city=['Venango Municipality', 'Waterford ship','New York']
common_words = ['ship','municipality']
items_to_add = []
for item in city:
toAddition = [word for word in item.split() if word.lower() not in common_words]
if ' '.join(toAddition) != item:
items_to_add.append(' '.join(toAddition))
print(city + items_to_add)
输出
['Venango municipality', 'Waterford ship', 'New York', 'Venango', 'Waterford']
答案 5 :(得分:4)
将结果放入单独的列表中,然后使用list.extend()
将结果列表的内容附加到原始列表中
cities = ['Venango Municipality', 'Waterford ship', 'New York']
common_words = ['ship', 'municipality']
add_list = []
for city in cities:
rl = []
triggered = False
for city_word in city.split():
if city_word.lower() in common_words:
triggered = True
else:
rl.append(city_word)
if triggered:
add_list.append(' '.join(rl))
cities.extend(add_list)
print(cities)
答案 6 :(得分:0)
带有re模块的方法:
import re
city=['Venango Municipality', 'Waterford ship','New York']
common_words = ['ship','municipality']
print(city)
for item in city:
word_list = str(item).split(" ")
for word in word_list:
if word.lower() in common_words:
word_list.remove(word)
city.extend(word_list)
continue
print(city)
输出:
['Venango Municipality', 'Waterford ship', 'New York', 'Venango', 'Waterford']
答案 7 :(得分:0)
尝试使用extend
:
city.extend([i.split()[0] for i in city if i.split()[1].lower() in map(str.lower,common_words)])
演示:
>>> city=['Venango Municipality', 'Waterford ship','New York']
>>> common_words = ['ship','municipality']
>>> city.extend([i.split()[0] for i in city if i.split()[1].lower() in map(str.lower,common_words)])
>>> city
['Venango Municipality', 'Waterford ship', 'New York', 'Venango', 'Waterford']
>>>
如果要拼错:
>>> city=['Venango Municiplaity', 'Waterford ship','New York']
>>> common_words = ['ship','municipality']
>>> from difflib import SequenceMatcher
>>> city.extend([i.split()[0] for i in city if any(SequenceMatcher(None,i.split()[1].lower(),v).ratio()>0.8 for v in map(str.lower,common_words))])
>>> city
['Venango Municiplaity', 'Waterford ship', 'New York', 'Venango', 'Waterford']
>>>