Question

我有两个元组列表，每个列表中的元组都是唯一的。列表具有以下格式：

[('col1', 'col2', 'col3', 'col4'), ...]

我正在使用嵌套循环来查找两个列表中的成员，这些成员对于给定的cols，col2和col3具有相同的值

temp1 = set([])
temp2 = set([])
for item1 in list1:
    for item2 in list2:
        if item1['col2'] == item2['col2'] and \
            item1['col3'] == item2['col3']:
            temp1.add(item1)
            temp2.add(item2)

只需工作即可。但是当列表中有数万个项目时，需要很长时间才能完成。

使用表格，我可以过滤list1 agianst col2，list2的一个项目的col3，如下所示：

list1 = tb.tabular(records=[...], names=['col1','col2','col3','col4'])
...

for (col1, col2, col3, col4) in list2:
    list1[(list1['col2'] == col2) & (list1['col3'] == col3)]

显然“做错了”并且比第一次慢得多。

如何使用numpy或tabular有效地检查元组列表中的所有项目的项目？

感谢

Answer 1

试试这个：

temp1 = set([])
temp2 = set([])

dict1 = dict()
dict2 = dict()

for key, value in zip([tuple(l[1:3]) for l in list1], list1):
    dict1.setdefault(key, list()).append(value)

for key, value in zip([tuple(l[1:3]) for l in list2], list2):
    dict2.setdefault(key, list()).append(value)

for key in dict1:
    if key in dict2:
        temp1.update(dict1[key])
        temp2.update(dict2[key])

肮脏的，但应该有用。

Answer 2

“如何使用numpy或tabular”有效地检查元组列表中的元素列表中的所有项目“

好吧，我没有使用表格的经验，而且很少有numpy，所以我不能给你一个精确的“罐装”解决方案。但我想我可以指出你正确的方向。如果list1的长度为X而list2的长度为Y，则表示您正在进行X * Y检查...而您只需需要进行X + Y检查。

您应该执行以下操作（我将假装这些是常规Python元组的列表 - 而不是表格记录 - 我确信您可以进行必要的调整）：

common = {}
for item in list1:
    key = (item[1], item[2])
    if key in common:
        common[key].append(item)
    else:
        common[key] = [item]

first_group = []
second_group = []
for item in list2:
    key = (item[1], item[2])
    if key in common:
        first_group.extend(common[key])
        second_group.append(item)

temp1 = set(first_group)
temp2 = set(second_group)

Answer 3

我创建了一个元组的子类，它有特殊的__eq__和__hash__方法：

>>> class SpecialTuple(tuple):
...     def __eq__(self, t):
...             return self[1] == t[1] and self[2] == t[2]
...     def __hash__(self):
...             return hash((self[1], self[2]))
...

它会比较col1和col2，并说在这些列相同的情况下元组是相等的。

然后过滤只是在这个特殊元组上使用set交集：

>>> list1 = [ (0, 1, 2, 0), (0, 3, 4, 0), (1, 2, 3, 12) ]
>>> list2 = [ (0, 1, 1, 0), (0, 3, 9, 9), (42, 2, 3, 12) ]
>>> set(map(SpecialTuple, list1)) & set(map(SpecialTuple, list2))
set([(42, 2, 3, 12)])

我不知道它有多快。告诉我。：）

通过numpy或tabular将所有项目相互比较来过滤两个列表

3 个答案: