Question

我正在尝试遍历两个列表，并且只想在第二个列表中存在项目时才打印它。我将通过非常大的文件执行此操作，因此不希望将它们存储在内存中，如列表或字典。有没有办法可以在不存储到列表或字典中的情况下执行此操作？

我可以执行以下操作以确认它们不在列表中但不确定为什么它在我尝试通过删除＆＃34;而不是＆＃34;确认它们在列表中时不起作用。

验证项目的代码在list_2中不存在。

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

验证项目的代码存在于list_2中。

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

for fruit_1 in list_1:
    if all(fruit_1 in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

Answer 1

这是使用pandas.read_csv创建内存映射文件的解决方案：

import pandas as pd

list1 = pd.read_csv('list1.txt', dtype=str, header=None, memory_map=True)
list2 = pd.read_csv('list2.txt', dtype=str, header=None, memory_map=True)

exists = pd.merge(list1, list2, how='inner', on=0)
for fruit in exists[0].tolist():
    print fruit

list1.txt和list2.txt文件包含问题中的字符串，每行包含一个字符串。

<强>输出

pear
kiwi

我没有任何可以试验的大文件，所以我没有任何性能测量。

Answer 2

所以这就是你得到它们的方式：

exists = [item for item in list_1 if item in list_2]
does_not_exist = [item for item in list_1 if item not in list_2]

并print他们：

for item in exists:
    print item
for item in does_not_exist:
    print item

但如果你想打印：

for item in list_1:
    if item in list_2:
        print item

Answer 3

您可以使用python的集合来计算两个列表中的项目

set(list1).intersection(set(list_2))

请参阅https://docs.python.org/2/library/sets.html

Answer 4

我能够通过做出真/假评估来完成逆转。

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

# DOES exist
for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is False:
        print(fruit_1)

print('\n')

# DOES NOT exist
for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is True:
        print(fruit_1)

Answer 5

我建议pandas适用于大规模数据。

使用pip进行安装：

pip install pandas

在某种程度上，你可以这样做：

import pandas as pd

s1 = pd.Index(list_1)
s2 = pd.Index(list_2)

exists = s1.intersection(s2)
does_not_exist = s1.difference(s2)

如果你执行print exists

，现在你会看到神奇的事情

请参阅Pandas Docs

Answer 6

代码的问题是如何评估 all（）函数。更简单地分解它。

## DOES EXIST
print all('kiwi' in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' in fruit_2 for fruit_2 in ['pear', 'kiwi'])

评估到

False
False

反过来，如果你做这样的事情

#DOES NOT EXIST
print all('apple' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])

评估到

True
False

我无法确定原因，但可能是 all（）函数返回true 如果iterable的所有元素都为真，否则为false

在任何情况下，我认为使用任何（）而不是所有（）的DOES存在部分都可以。

print "DOES NOT EXIST"
for fruit_1 in list_1:
    # print all(fruit_1 not in fruit_2 for fruit_2 in list_2)
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

print "\nDOES EXIST"
for fruit_1 in list_1:
    if any(fruit_1 in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

DOES NOT EXIST
apple
orange
strawberry
banana

DOES EXIST
pear
kiwi

Answer 7

您的代码存在的一个问题是所有方法returns false if any single check returns false。另一个是fruit_1 in fruit_2部分正在检查fruit_1是否是fruit_2的子字符串。如果我们要修改列表以使您的逻辑工作，它们看起来像：

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'berry',
          'banana',
          'grape']

list_2 = ['grape',
          'grape',
          'grape',
          'grape',
          'grape']

但可能是：

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'berry',
          'banana',
          'grape']

list_2 = ['strawberry',
          'strawberry',
          'strawberry',
          'strawberry',
          'strawberry',
          'strawberry']

因为berry位于strawberry。如果我们继续使用迭代进行此检查，而不是交集，as @wrdeman suggested，那么，使用您提供的数据集，它将如下所示：

for fruit_1 in list_1:
    if fruit_1 in list_2:
        print(fruit)

其他修改可能是将all更改为any，returns true if any of the iterables items return true。然后你的代码看起来像：

for fruit_1 in list_1:
    if any(fruit_1 == fruit_2 for fruit_2 in list_2):
        print(fruit_1)

迭代列表

7 个答案: