Question

我有两个列表

list1 = ['a', 'b', 'c', 'd']
list2 = ['e', 'f', 'g', 'h']

从此我就知道其中一些元素是通过另一个列表关联的

ref_list = [
   ['d', 'f'], ['a', 'e'], ['b', 'g'], ['c', 'f'], ['a', 'g'],
   ['a', 'f'], ['b', 'e'], ['b', 'f'], ['c', 'e'], ['c', 'g']
]

我想快速确定list1和list2中的两个组，它们在[list1 element, list2 element]中具有所有可能的对ref_list。
在这种情况下，解决方案将是

[['a', 'b', 'c'], ['e', 'f', 'g']]

我可以想到一些方法来处理如此小的列表，但是如果list1，list2和ref_list每个都有数千个元素，则需要帮助。

Answer 1

集合包含似乎非常快。

public static BigInteger phiR (long limit, long [] s) // limit is 10^t, s is the sieve of precomputed values of `P(n)`. Can store maximum 350,000,000 values
    {                                                                                                                                                       
        if (limit<s.length)                                 
            return BigInteger.valueOf(s[(int) limit]);
        BigInteger sum = BigInteger.valueOf(limit).multiply(BigInteger.valueOf(limit).add(BigInteger.ONE)).divide(BigInteger.valueOf(2)); // this corresponds to the n'th triangular number
        BigInteger midsum1=BigInteger.ZERO; // the first sum
        BigInteger midsum2=BigInteger.ZERO; // the second sum
        long m = 2;
        while (limit/m != limit/(m+1) && m*m<=limit) // computing the first sum, first for changing floor(limit/m) values
        {
            midsum1=midsum1.add(phiR((long) Math.floor(limit/m),s));
            m++;
        }
        for (long k = m;k*k<=limit;k++) // once the floors become constant for some values,-->
        {                               //  can check how many times the value appears, and multiply accordingly,--> 
            BigInteger midPhi = phiR((long) Math.floor(limit/k),s);  // rather than compute the Phi every time
            long q = 1;
            while (limit/k==limit/(k+1)&&k*k<=limit)
            {
                q++;
                k++;
            }
            k--;
            midPhi=midPhi.multiply(BigInteger.valueOf(q));
            midsum1=midsum1.add(midPhi);
        }
        for (long d=1;d*d<=limit;d++) // computing the second sum
            if ((double)d!=Math.floor(limit/d))
                midsum2=midsum2.add(BigInteger.valueOf((long) (Math.floor(limit/d)-Math.floor(limit/(d+1)))).multiply(phiR(d,s)));
        sum=sum.subtract(midsum1).subtract(midsum2);
        return sum;
    }

Answer 2

您可以添加ref_list中每对元素来设置set1和set2，然后使用list1 = list(set1)和list2 = list(set2)。集不包含重复项，这对于成千上万个元素来说应该是快速的，因为集的e in s1需要O(1) time on average。

Answer 3

您可以使用collections.Counter来生成ref_list中项目的计数，并使用它们来过滤出两个列表中出现多次的项目：

from collections import Counter
[[i for i in lst if counts.get(i, 0) > 1] for lst, ref in zip((list1, list2), zip(*ref_list)) for counts in (Counter(ref),)]

这将返回：

[['a', 'b', 'c'], ['e', 'f', 'g']]

从两个具有预定义关系的集合中查找元素对

3 个答案: