如果找到重复,则取列表中的值的均值

时间:2017-12-05 14:36:32

标签: python arrays list for-loop append

我有2个相互关联的列表。例如,在这里,'John'与'1'相关联,'Bob'与4相关联,依此类推:

l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

我的问题在于重复的约翰。而不是添加重复的John,我想取与Johns相关的值的平均值,即1和3,即(3 + 1)/ 2 = 2.因此,我希望列表实际上是:

l1 = ['John', 'Bob', 'Stew']
l2 = [2, 4, 7]

我已经尝试了一些解决方案,包括for循环和“包含”功能,但似乎无法将它拼凑在一起。我对Python不是很有经验,但链接列表听起来像是可以用于此。

谢谢

5 个答案:

答案 0 :(得分:3)

我相信你应该使用 dict 。 :)

def mean_duplicate(l1, l2):
    ret = {}
    #   Iterating through both lists...
    for name, value in zip(l1, l2):
        if not name in ret:
            #   If the key doesn't exist, create it.
            ret[name] = value
        else:
            #   If it already does exist, update it.
            ret[name] += value

    #   Then for the average you're looking for...
    for key, value in ret.iteritems():
        ret[key] = value / l1.count(key)

    return ret

def median_between_listsElements(l1, l2):
    ret = {}

    for name, value in zip(l1, l2):
        #   Creating key + list if doesn't exist.
        if not name in ret:
            ret[name] = []
        ret[name].append(value)

    for key, value in ret.iteritems():
        ret[key] = np.median(value)

    return ret

l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

print mean_duplicate(l1, l2)
print median_between_listsElements(l1, l2)
# {'Bob': 4, 'John': 2, 'Stew': 7}
# {'Bob': 4.0, 'John': 2.0, 'Stew': 7.0}

答案 1 :(得分:1)

以下可能会给你一个想法。它使用OrderedDict,假设您希望从原始列表中按出现顺序排列项目:

from collections import OrderedDict

d = OrderedDict()
for x, y in zip(l1, l2):
    d.setdefault(x, []).get(x).append(y)
# OrderedDict([('John', [1, 3]), ('Bob', [4]), ('Stew', [7])])


names, values = zip(*((k, sum(v)/len(v)) for k, v in d.items()))
# ('John', 'Bob', 'Stew')
# (2.0, 4.0, 7.0)

答案 2 :(得分:0)

这是使用dict的简短版本,

final_dict = {}
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

for i in range(len(l1)):
    if final_dict.get(l1[i]) == None:
        final_dict[l1[i]] = l2[i]
    else:
        final_dict[l1[i]] = int((final_dict[l1[i]] + l2[i])/2)


print(final_dict)

答案 3 :(得分:0)

这样的事情:

#!/usr/bin/python
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]
d={}
for i in range(0, len(l1)):
    key = l1[i]
    if d.has_key(key):
         d[key].append(l2[i])
    else:
         d[key] = [l2[i]]
r = []
for values in d.values():
    r.append((key,sum(values)/len(values)))
print r

答案 4 :(得分:0)

希望以下代码有帮助

l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

def remove_repeating_names(names_list, numbers_list):
    new_names_list = []
    new_numbers_list = []
    for first_index, first_name in enumerate(names_list):
        amount_of_occurencies = 1
        number = numbers_list[first_index]
        for second_index, second_name in enumerate(names_list):
            # Check if names match and
            # if this name wasn't read in earlier cycles or is not same element.
            if (second_name == first_name):
                if (first_index < second_index):
                    number += numbers_list[second_index]
                    amount_of_occurencies += 1
            # Break the loop if this name was read earlier.
                elif (first_index > second_index):
                    amount_of_occurencies = -1
                    break
        if amount_of_occurencies is not -1:
            new_names_list.append(first_name)
            new_numbers_list.append(number/amount_of_occurencies)
    return [new_names_list, new_numbers_list]

# Unmodified arrays
print(l1)
print(l2)

l1, l2 = remove_repeating_names(l1, l2)

# If you want numbers list to be integer, not float, uncomment following line:
# l2 = [int(number) for number in l2]

# Modified arrays
print(l1)
print(l2)