Question

我有一份表单

(i,1)

我想扫描列表并返回(2,3),(4,3)重复的元素。（我道歉，我无法更好地构建这一点）

例如，在给定的列表中，对是(3,4),(1,4),(5,4)，我看到3重复，所以我希望返回2和4.同样，从for i in range(0,p): for j in range(i+1,p): if (arr[i][1] == arr[j][1]): print(arr[i][0],arr[j][0])我将返回3,1和5因为4重复。

我已经实施了泡泡搜索，但这显然非常慢。

        int remove_cr_lf(char *str)
        {
          int len =0;


          len = strlen(str);

          for(int i=0;i<5;i++)
          {
            if (len>0)
            if (str[len-1] == '\n')
            {
              str[len-1] = 0;
              len--;
            }

            if (len>0)
            if (str[len-1] == '\r')
            {
              str[len-1] = 0;
              len--;
            }
          }

          return 0;
        }

我该如何解决？

Answer 1

您可以使用collections.defaultdict。这将返回从第二个项目到第一个项目列表的映射。然后，您可以通过字典理解来过滤重复。

from collections import defaultdict

lst = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]

d = defaultdict(list)

for i, j in lst:
    d[j].append(i)

print(d)

# defaultdict(list, {3: [2, 4], 4: [3, 1, 5], 5: [6]})

res = {k: v for k, v in d.items() if len(v)>1}

print(res)

# {3: [2, 4], 4: [3, 1, 5]}

Answer 2

使用numpy允许避免for循环：

import numpy as np
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]

a = np.array(l) 
items, counts = np.unique(a[:,1], return_counts=True)
is_duplicate = np.isin(a[:,1], items[counts > 1]) # get elements that have more than one count
print(a[is_duplicate, 0]) # return elements with duplicates

# tuple(map(tuple, a[is_duplicate, :])) # use this to get tuples in output

（切换注释以获取元组形式的输出）

pandas是另一种选择：

import pandas as pd
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]

df = pd.DataFrame(l, columns=list(['first', 'second']))
df.groupby('second').filter(lambda x: len(x) > 1)

在Python中过滤（Nx1）列表

2 个答案: