从两个具有预定义关系的集合中查找元素对

时间:2019-04-29 20:54:25

标签: python set compare element

我有两个列表

list1 = ['a', 'b', 'c', 'd']
list2 = ['e', 'f', 'g', 'h']

从此我就知道其中一些元素是通过另一个列表关联的

ref_list = [
   ['d', 'f'], ['a', 'e'], ['b', 'g'], ['c', 'f'], ['a', 'g'],
   ['a', 'f'], ['b', 'e'], ['b', 'f'], ['c', 'e'], ['c', 'g']
]

我想快速确定list1list2中的两个组,它们在[list1 element, list2 element]中具有所有可能的对ref_list
在这种情况下,解决方案将是

[['a', 'b', 'c'], ['e', 'f', 'g']]

我可以想到一些方法来处理如此小的列表,但是如果list1list2ref_list每个都有数千个元素,则需要帮助。

3 个答案:

答案 0 :(得分:0)

集合包含似乎非常快。

public static BigInteger phiR (long limit, long [] s) // limit is 10^t, s is the sieve of precomputed values of `P(n)`. Can store maximum 350,000,000 values
    {                                                                                                                                                       
        if (limit<s.length)                                 
            return BigInteger.valueOf(s[(int) limit]);
        BigInteger sum = BigInteger.valueOf(limit).multiply(BigInteger.valueOf(limit).add(BigInteger.ONE)).divide(BigInteger.valueOf(2)); // this corresponds to the n'th triangular number
        BigInteger midsum1=BigInteger.ZERO; // the first sum
        BigInteger midsum2=BigInteger.ZERO; // the second sum
        long m = 2;
        while (limit/m != limit/(m+1) && m*m<=limit) // computing the first sum, first for changing floor(limit/m) values
        {
            midsum1=midsum1.add(phiR((long) Math.floor(limit/m),s));
            m++;
        }
        for (long k = m;k*k<=limit;k++) // once the floors become constant for some values,-->
        {                               //  can check how many times the value appears, and multiply accordingly,--> 
            BigInteger midPhi = phiR((long) Math.floor(limit/k),s);  // rather than compute the Phi every time
            long q = 1;
            while (limit/k==limit/(k+1)&&k*k<=limit)
            {
                q++;
                k++;
            }
            k--;
            midPhi=midPhi.multiply(BigInteger.valueOf(q));
            midsum1=midsum1.add(midPhi);
        }
        for (long d=1;d*d<=limit;d++) // computing the second sum
            if ((double)d!=Math.floor(limit/d))
                midsum2=midsum2.add(BigInteger.valueOf((long) (Math.floor(limit/d)-Math.floor(limit/(d+1)))).multiply(phiR(d,s)));
        sum=sum.subtract(midsum1).subtract(midsum2);
        return sum;
    }

答案 1 :(得分:0)

您可以添加ref_list中每对元素来设置set1set2,然后使用list1 = list(set1)list2 = list(set2)。集不包含重复项,这对于成千上万个元素来说应该是快速的,因为集的e in s1需要O(1) time on average

答案 2 :(得分:0)

您可以使用collections.Counter来生成ref_list中项目的计数,并使用它们来过滤出两个列表中出现多次的项目:

from collections import Counter
[[i for i in lst if counts.get(i, 0) > 1] for lst, ref in zip((list1, list2), zip(*ref_list)) for counts in (Counter(ref),)]

这将返回:

[['a', 'b', 'c'], ['e', 'f', 'g']]