Question

我有一个字符串列表，其中字符串的第一部分是列表中其他元素的子字符串。我的目标是找到所有类似的字符串，即带有'ID_1'子字符串的元素，将它们添加到列表中，然后在“ =”之后加上它们各自的值。

示例：

start_list = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']

我尝试过使用for循环遍历start_list，创建各种嵌套列表，甚至尝试使用字典，但我一直盘旋而过。

我知道某处有一个优雅的解决方案。

我期望的输出是：

ID_1 = 6
ID_2 = 15

提前谢谢！

Answer 1

您可以使用groupby中的itertools以优雅的方式做到这一点

l = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']
l_2 = sorted(x.split('=') for x in l)
from itertools import groupby

ans = [(k, sum(int(y) for x,y in g))
       for k,g in  groupby(l_2, key=lambda x: x[0])]

for key, value in ans:
    print(key, '=', value)

其他优雅的解决方案可以是使用defaultdict或reduce

请注意，这是O（nlog（n））解决方案，因为您需要对列表进行排序

Answer 2

您可以使用defaultdict。我发现它是最紧凑，最正确的变体。

代码：

from collections import defaultdict

start_list = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']

d = defaultdict(int)
lst = [item.split('=') for item in start_list]
for k, v in lst:
    d[k] += int(v)

print(d.items())

输出：

dict_items([('ID_1', 6), ('ID_2', 15)])

您可以遍历d.items来以所需格式打印数据。

代码：

for k, v in d.items():
    print(f"{k}={v}")

输出：

ID_1=6
ID_2=15

Answer 3

您可以使用collections.Counter来跟踪总和。如果您愿意的话，与functools.reduce结合使用，甚至可以将它变成单线的：

>>> from functools import reduce
>>> from collections import Counter
>>> start_list = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']
>>> reduce(lambda c, x: c.update({x[0]: int(x[1])}) or c,
...        (x.split("=") for x in start_list), collections.Counter())
...
Counter({'ID_1': 6, 'ID_2': 15})

（这里，or c使lambda返回c而不是update的结果（None）

Answer 4

start_list = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']
dict = {}

for item in start_list:
    k = item.split('=')[0]
    if k in dict.keys():
        dict[k] = int(dict[k])+int(item.split('=')[1])
    else:
        dict.update({k:int(item.split('=')[1])})

print (dict) # {'ID_1': 6, 'ID_2': 15}

for key,val in dict.items():
    print ("{} = {}".format(key,val))

输出：

ID_1 = 6
ID_2 = 15

Answer 5

考虑到这是您的第一个问题，我的方法是力求尽可能简单和直截了当，并在每一步中添加很多评论来详细解释。

虽然提供更复杂的代码或pythonic代码将是更好的解决方案，但最终可能会为您提供您无法轻易理解或自定义的代码。

start_list = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']
print start_list

# Here I am preparing an empty dictionary to store the counted keys and values
counted = {}
# Now I iterate through every string in start_list
for item in start_list:
    # As 1st thing I will use split method to separate the current_key
    current_key = item.split("=")[0]
    # and the current value. 
    current_value = int(item.split("=")[1])
    # Then I check if current_key (e.g. ID_1) is present in the
    # count dictionary using "in"
    if current_key in counted:
        # If the key is present I update its value with the sum
        # of its old value + new one
        counted[current_key] = current_value + counted[current_key]
    else:
        # If the key doesn't exist it means that we are adding it
        # to the counted dictionary for the 1st time
        counted[current_key] = current_value 

# Job is done!
print counted

# It is now easy to iterate through counted dict for further manipulation
# for example let's print the number of hits for ID_1

# You can use items() to enumerate keys and values in a dictionary
for key, value in counted.items():
    if key == "ID_1":
        print("Found ID_1 value: " + str(value))

# To obtain the output in your requirement
for key in counted.keys():
    print( '%s = %d' %(key, counted[key]))

如果您想进一步了解split方法的工作原理，请参考以下示例：
https://www.w3schools.com/python/ref_string_split.asp

在其他答案中，您将找到更多简洁明了的方法来获得此结果。

因此，为了改进我编写的代码，建议您在此处阅读有关列表推导的更多信息：
https://www.pythonforbeginners.com/basics/list-comprehensions-in-python

黑客很开心！

Answer 6

您可以使用列表理解+字典理解：

start_list = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']
l = [i.split('=') for i in start_list]
d = dict(l)
print({k:sum([int(i[1]) for i in l if i[0] == k]) for k,v in d.items()})

输出：

{'ID_1': 6, 'ID_2': 15}

Answer 7

如果可以确保数据始终具有相同的格式，则可以遍历列表，然后创建一个字典来保存结果：

start_list = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']
result = {}

for item in start_list:
    id, value = item.split('=')
    # Create new key, value if key is not in 'result' dict, sum up if it does exists
    result[id] = int(value) if not result.get(id) else (int(value) + result[id])

print(result) # {'ID_2': 15, 'ID_1': 6}

Answer 8

您可以执行以下操作：

l = ['ID_1=1', 'ID_1=2', 'ID_1=3', 'ID_2=4', 'ID_2=5', 'ID_2=6']

def calculate_score_byid(s):
    '''takes a list of items and adds up scores. returns a dictionary of scores'''
    d = dict()
    for i in l:
        if i.split('=')[0] not in d.keys():
            d[i.split('=')[0]]=int(i.split('=')[1])
        else:
            d[i.split('=')[0]]=int(d[i.split('=')[0]])+int(i.split('=')[1])
    return d

calculate_score_byid(l)
for key in d.keys():
    print( '%s = %d' %(key,d[key]))

>>>ID_1 = 6
>>>ID_2 = 15

Python：在列表中找到相应的元素，并将其汇总到新列表中

8 个答案: