I need to compare the first n characters of items in a list to the first n characters of other items in the same list, then remove or keep one of those items.
In the example list below, “AB2222_100” and “AB2222_P100” would be considered duplicates (even though they're technically unique) because the first 6 characters match. When comparing the two values, if x[-4:] = "P100", then that value would be kept in the list and the value without the “P” would be removed. The other items in the list would be kept since there isn’t a duplicate, regardless of whether it's “P100” or “100” suffix at the end of the string. For this case, there will never be more than one duplicate (either a “P” or not).
I understand slicing and comparing, but everything is assuming unique values. I was hoping to use list comprehension instead of a long for loop, but also want to understand what I'm seeing. I've gotten lost trying to figure out collections, sets, zip, etc. for this non-unique scenario.
Slicing and comparing isn't going to retain the required suffix that needs to be maintained in the final list.
newList = [x[:6] for x in myList]
This is how it should start and end.
myList = ['ABC1111_P100', 'ABC2222_100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']
newList = ['ABC1111_P100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']
答案 0 :(得分:0)
如您的评论中所述,您不能一口气做到这一点。您可以在O(n)
时间内完成此操作,但这会占用一些额外空间:
myList = ['ABC1111_P100', 'ABC2222_100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']
seen = dict()
print(myList)
for x in myList:
# grab the start and end of the string
start, end = x.split('_')
if start in seen: # If we have seen this value before
if seen[start] != 'P100': # Did that ending have a P value?
seen[start] = end # If not swap out the P value
else:
# If we have not seen this before then add it to our dict.
seen[start] = end
final_list = ["{}_{}".format(key, value) for key, value in seen.items()]
print(final_list)