Checking WHAT is missing from a list when comparing it to another list python

时间:2016-08-31 17:58:14

标签: python list sorting

I am looking to see what is missing from a list (A) from list (B)

If I have the following list of strings:

A = ['4-5', '3-6', '3-3', '9-0'] and B = ['4-4', '4-5', '3-3', '6-9', '5-5', '3-2', '6-6', '9-9', '9,0'] and want to check what is missing from A that is in list B.

A = [4-5,3-6,3-3, 9-0] B = [4-4, 4-5, 3-3, 6-9, 5-5, 3-6, 3-2, 6-6, 9-9, 9,0]

so... from the example from above, I would want it to output ['4-4', '6-9', '5-5', '3-2', '6-6', '9-9'].

if I sort both the lists, what's the best way of going about it?

Thanks!

I t hought about doing something like:

unique = []
for n in A:
    if n not in B:
        unique.append(B)
print(unique)

does this work? it's giving me a very odd output of a list in a list of two strings.

5 个答案:

答案 0 :(得分:5)

I don't know what 4-5 means? is it a string, an operation?

Anyways, assuming it is whatever you meant it to be you can do as follows:

A = [4-5,3-6,3-3, 9-0]
B = [4-4, 4-5, 3-3, 6-9, 5-5, 3-2, 6-6, 9-9, 9,0]

a = set(A)
b = set(B)

print b - a

答案 1 :(得分:1)

Don't bother sorting. Use sets instead and calculate the difference:

A = ['4-5','3-6','3-3', '9-0']
B = ['4-4', '4-5', '3-3', '6-9', '5-5', '3-2', '6-6', '9-9', '9','0']

print(set(B) - set(A))
>> {'0', '6-9', '9-9', '5-5', '3-2', '6-6', '4-4', '9'}

Your required out put was [4-4, 6-9, 5-5, 3-2, 6-6, 9-9]. You either missed a few, or you meant to treat '9' as '9-0'.

答案 2 :(得分:0)

You could do this:

>>> A = ['4-5','3-6','3-3','9-0'] 
>>> B=['4-4','4-5','3-3','6-9','5-5','3-2','6-6','9-9','9','0']
>>> set(B)-set(A)
set(['5-5', '4-4', '9-9', '3-2', '0', '6-9', '9', '6-6'])
>>> 

答案 3 :(得分:0)

Simple in a list comprehension too. Not sure why you would sort the inputs, I don't see that it's really necessary, but I've sorted the output.

A = ["4-5",'3-6','3-3', '9-0']
B = ['4-4', '4-5', '3-3', '6-9', '5-5', '3-2', '6-6', '9-9', '9','0'] 
new = sorted([x for x in B if x not in A])

though your expected output doesn't include the last two entries "9" and "0", or "9,0" depending on interpretation

答案 4 :(得分:0)

In most situations, the best way is to ignore the fact the data is sorted and just do set(B) - set(A). Or list(set(B) - set(A)) if you definitely need a list for the result.

However, that has a moderately large space overhead (approximately the sum of the sizes of the two input lists). Normally this is nothing to worry about, but if the data is very large (uses more than half your available memory) then you might find you need to reduce this. You could first try:

A_set = set(A)
result = [b for b in B if b not in A_set]

This avoids constructing a set for B or a set for the difference, so the overhead is approximately the size of A.

For your interest, or for situations where resources are very tightly constrained, you might like to know that it's possible to do this with only constant-space overhead supposing that A and B are already sorted (which in your example they are not, but you promise they will be). The trick is to notice that as you look for each element of B in A:

  • you can search through A in order, and stop searching when you find an element no smaller than the one you're looking for. You won't find it after that because A is sorted.
  • if you found the previous element of B, then you will not find the next one before the place where you found the previous element of B in A, because both lists are sorted.
  • if you did not find the previous element of B, then you likewise will not find it before the place where you stopped looking.
  • therefore at each step we can resume searching from where we left off last time.

Putting it all together, this means we can make a single simultaneous pass over each of the inputs A and B, and in the process of this single pass decide for each element of B whether or not it is in A. If you're familiar with merge sort (or merging), then be aware that the process is similar to a merge, but the output is different. Not only is the overhead zero, but for advanced uses we don't even need lists, we can do it all in generators so that the input needn't necessarily even all be in memory at once. But sticking with lists to illustrate it:

def find_element(elt, arr, idx):
    while idx < len(arr) and elt > arr[idx]:
        # haven't found it yet
        idx += 1
    # have found the place where it would go, but is it here?
    if idx < len(arr) and elt == arr[idx]:
        # it's here
        return True, idx + 1
    # not found
    return False, idx

a_idx = 0
b_idx = 0
results = []
while a_idx < len(A) and b_idx < len(B):
    found, a_idx = find_element(B[b_idx], A, a_idx)
    if not found:
        results.append(B[b_idx])
    b_idx += 1
# if there's anything left in B to check then it's definitely not in A
results.extend(itertools.islice(B, b_idx, None))

Finally, we could potentially improve the speed of find_element, for large arrays, using a binary search instead of a linear search. I leave this as an exercise for the reader :-)