Question

我有许多可以包含小数的数字列表。例如，

A = ['1', '1.01', '1.1', '2', '3', '3.2', '4', '5']

假设我希望获得平均值相差小于0.5的平均值，并使用这些平均值加上未受影响的条件列出新列表。

在我的例子中，数字1,1.01和1.1相互之间的差异小于0.5，因此新列表将包括它们的平均值1,04。同样，对于3和3.2，新列表将包括平均值3,1。

所以最终的输出是：

B = [1.04, 2, 3.1, 4, 5]

有一些特殊情况，例如列表

C = [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]

出现某些问题：我们是平均前5个元素，还是最后5个元素？如果可能的话，我更喜欢从左到右的优先级，即将前5个元素分组并保留第6个元素。但是，我的列表中的数据不太可能显示此类行为，因为相似的值彼此足够接近。没有必要在代码中包含这些案例，除非它必须能够正常工作。

最有效的方法是什么？在实践中，我将使用它来构建不同超新星的光弯曲。如果两个观测值之间的时间差小于某个值，我也可以将其视为对两者之间的平均时间进行的单一观察。

我对Python很陌生，到目前为止我解决这个问题的所有努力都失败了......如果这太基础，我道歉。

提前致谢！

Answer 1

A = [1, 1.01, 1.1, 2, 3, 3.2, 4, 5]
groups, current_group, first = [], [], A[0]
for item in A:
    # Check if this element falls under the current group
    if item - first <= 0.5:
        current_group.append(item)
    else:
        # If it doesn't, create a new group and add old to the result
        groups.append(current_group[:])
        current_group, first = [item], item
# Add the last group which was being gathered to the result
groups.append(current_group[:])

现在，获得平均值非常简单，就像这样

print[sum(item) / len(item) for item in groups]
# [1.0366666666666666, 2, 3.1, 4, 5]

Answer 2

我认为答案是 @thefourtheye比这个好。

A = ['1', '1.01', '1.1', '2', '3', '3.2', '4', '5']
# change str to float and sort it.
a = sorted([float(v) for v in A])

averaged_start = a[0]

averaged_dict = {}
for value in a:
    if value - averaged_start < 0.5:
        averaged_dict.setdefault(averaged_start, []).append(value)
    else:
        averaged_start = value
        averaged_dict[averaged_start] = [averaged_start]

result = [round(sum(v)/len(v), 2) for k, v in averaged_dict.items()]
print(result)

输出：

[1.04, 2.0, 3.1, 4.0, 5.0]

Answer 3

显然有很多方法可以解决这个问题。我想分享一种使用grouper函数和标准库的替代方法。我还定义了一个便利函数average_similar。用法示例：

# Convert, sort and group.  Print generated groups.
A = ['1', '1.01', '1.1', '2', '3', '3.2', '4', '5']
a1 = sorted(float(f) for f in A)
g1 = grouper(a1)

print("Grouped A:", g1)
# Grouped A: [[1.0, 1.01, 1.1], [2.0], [3.0, 3.2], [4.0], [5.0]]


# Generate new list as average of each group.
g2 = (mean(g) for g in grouper(a1))
a2 = list(g2)

print("Averaged grouped A:", a2)
# Averaged grouped A: [1.0366666666666668, 2.0, 3.1, 4.0, 5.0]

print("Averaged grouped A:", average_similar(A, width=0.5))
# Averaged grouped A: [1.0366666666666668, 2.0, 3.1, 4.0, 5.0]


# Generate new list as rounded averages of each group.
g3 = (round(mean(g), 2) for g in grouper(a1))
a3 = list(g3)

print("Averaged grouped and rounded A:", a3)
# Averaged grouped and rounded A: [1.04, 2.0, 3.1, 4.0, 5.0]

print("Averaged grouped and rounded A:", average_similar(A, 0.5, 2))
# Averaged grouped and rounded A: [1.04, 2.0, 3.1, 4.0, 5.0]


# A more compact example given a list of numbers.
C = [1.1, 1.2, 1.3, 1.5, 1.6, 1.4]
# In-place sort.
C.sort() 
lc = list(round(mean(g), 2) for g in grouper(C))
print("Average C", lc)
# Average C [1.3, 1.6]
print("Average C", average_similar(C, precision=2))
# Average C [1.3, 1.6]


# Another examples as a one-liner.
D = ['1', '1.01', '1.1', '2', '3', '3.2', '4', '5', '5.1', '6', '2.5']
ld = list(round(mean(g), 2)
          for g in grouper(
                  sorted(float(f) for f in D)))
print("Average D", ld)
# Average D [1.04, 2.0, 2.5, 3.1, 4.0, 5.05, 6.0]
print("Average D", average_similar(D, width=0.5, precision=2))
# Average D [1.04, 2.0, 2.5, 3.1, 4.0, 5.05, 6.0]

这些例子使用以下代码：

import itertools
from statistics import mean

def average_similar(iterable, width=0.5, precision=None, criteria=make_keyfcn):
    """Return a list where similar numbers have been averaged.

    Items are grouped using the supplied width and criteria and the
    result is rounded to precision if it is supplied.  Otherwise
    averages are not rounded.

    """
    lst = sorted(float(f) for f in iterable)
    g1 = (mean(g) for g in grouper(lst, criteria(width)))
    if precision is not None:
      g1 = (round(g, precision) for g in g1)
    return list(g1)

def grouper(iterable, criteria=None):
    if criteria is None:
        criteria = make_keyfcn()
    result = []
    for k, g in itertools.groupby(iterable, criteria):
        result.append(list(g))
    return result

def make_keyfcn(width=0.5):
    "Grouping critera."
    key = None
    def keyfcn(x):
        """As long as x is < key, keep returning key.

        Update when x >= key.
        """
        nonlocal key
        if key is None:  # When called the first time.
            key = x + width
        elif x >= key:
            key = x + width
        return key
    return keyfcn

Answer 4

在提出问题之后，有人用以下代码回复了它：

from collections import OrderedDict

A = [1, 1.01, 1.02, 2, 3, 4, 4,4.1]

d = OrderedDict()

for item in A:
    d.setdefault(int(item/0.25), []).append(item)

    A = [sum(item) / len(item) for item in d.itervalues()]

print A
#[1.01, 2, 3, 4.033333333333333]

到目前为止，这段代码工作得很好，虽然我还没有测试过它的每一个细节。感谢发布它并随后删除它的人！

Python - 列表中的平均相似值，并创建具有平均值的新列表

4 个答案: