我想比较两个列表并提取内容
sklearn
预期结果:
colours = ["yellow", "light pink", "red", "dark blue", "red"]
items = ["the sun is yellow but the sunset is red ",
"the pink cup is pretty under the light",
"it seems like the flower is red",
"skies are blue",
"i like red"]
如果颜色列表中有两个单词,则该项目将分解为两个单词。 如您所见,颜色中单词的顺序(“粉红色”,“浅色”)并不重要,因为这两个单词被分解成单个单词,然后在句子中分别进行比较。请注意,在项目的第一项中,尽管颜色列表中有“红色”,但我不想提取它,因为“红色”与该项目的索引在不同的索引中。
对于第四个索引为“深蓝色”和“天空为蓝色”,结果应仅显示“蓝色”,因为项目中不存在深色。
我尝试进行编码,但是我得到的结果是列表不会一次在同一索引内进行比较,而是会循环多次,因此重复了“红色”。
["yellow", "pink light", "red", "blue", "red"]
结果:
colours=["yellow","light pink","red"," dark blue","red"]
items=["the sun is yellow but the sunset is red ","the pink cup is pretty under the light", "it seems like the flower is red", "skies are blue","i like red"]
for i in colours:
y=i.split() #split 2 words to 1 word
for j in y:
#iterate word by word in colours that have more than 1 word
for z in items:
s=z.split() #split sentences into tokens/words
for l in s:
#compare each word in items with each word in colours
if j == l:
print j
正确的结果:
yellow
light
pink
red
red
red
blue
red
red
red
答案 0 :(得分:4)
有了zip
,您可以轻松得多:
colours=["yellow","light pink","red"," dark blue","red"]
items=["the sun is yellow but the sunset is red ","the pink cup is pretty under the light", "it seems like the flower is red", "skies are blue","i like red"]
lst = []
for x, y in zip(colours, items):
word = ''
for c in y.split():
if c in x:
word = word + ' ' + c
lst.append(word.strip())
print(lst)
# ['yellow', 'pink light', 'red', 'blue', 'red']
答案 1 :(得分:3)
您可以使用以下列表理解:
print([' '.join(w for w in i.split() if w in c.split()) for c, i in zip(colours, items)])
这将输出:
['yellow', 'pink light', 'red', 'blue', 'red']
答案 2 :(得分:1)
使用集合测试成员资格应该更快,但要注意:
>>> [' '.join(set(colour.split()) & set(item.split()))
for colour, item in zip(colours, items)]
['yellow', 'pink light', 'red', 'blue', 'red']
警告是布景是无序的,因此“粉红色”可能会以“浅粉红色”出现。