如何用相邻的平均值填充缺失的零值

时间:2019-08-02 14:00:27

标签: python python-3.x

我有问题,但是我无法编写任何代码。我不知道如何解决这个问题。所以这就是为什么我不发布代码。如果有人可以帮忙,请这样做。

假设我有一些类似下面的数据,

my_list = [[0, 0, 1, 2, 3, 4, 0, 2, 0, 0], [1, 3, 4, 5, 0, 3, 0, 0, 0]]

我正在尝试使用平均值(如平滑操作)填充零。

让我们接受第一个内部列表[0, 0, 1, 2, 3, 4, 0, 2, 0, 0]

1)首先0可以用(nonzero-left_bound, non-zero right_bound) / no of values)填充。这没有左边界,因为它是列表中的第一个数字,右边界的非零值是1。因此,(0+0+1)/3 = 0.3现在用0.3填充所有三个值,然后内部列表将是

[0.3, 0.3, 0.3, 2, 3, 4, 0, 2, 0, 0]

2)填充中间的零值:(4+0+2)/3 = 6/3 = 2.

[0.3, 0.3, 0.3, 2, 3, 2, 2, 2, 0, 0]

3)用(2 + 0 + 0)/3 = 2/3= 0.6填充后两个零值

[0.3, 0.3, 0.3, 2, 3, 2, 2, 0.6, 0.6, 0.6]

类似地,为每个内部列表填充所有剩余的零。

2 个答案:

答案 0 :(得分:3)

这是一个非常幼稚且有点杂乱的实现,但是它应该做您想要的。不幸的是,这里需要处理很多边缘情况,这是大多数混乱的来源。

# create a new list to hold our modified sublists
smoothed_list = []
# do for each sublist of my_list
for lst in my_list:
    # we'll be building a new list manually out of element from the old list.
    # this is too complicated to do in a list comprehension, unfortunately.
    new_lst = [lst[0]]
    # The number of contiguous zeroes we've seen
    zero_ct = 0
    # the last nonzero element that we saw
    last_nonzero_element = 0
    # step through this list one element at a time
    # I'm iterating by index instead of by element so that I can check for the last element
    for idx in range(1, len(lst)):
        elem = lst[idx]
        # If the current element is zero, then just add to the zero count.
        # note that if the last element is zero, we would end up with the wrong-size list
        # therefore we must take the other branch no matter what on the last element of the list
        if elem == 0 and idx < len(lst) - 1:
            zero_ct += 1
        # Otherwise, we either resolve the recent chain of zeroes, or just
        # add the current element to the new list.
        else:
            # If this is the first nonzero value in a while, or if this is a zero
            # at the end of the list that we need to resolve
            if zero_ct > 0 or (elem == 0 and idx == len(lst) - 1):
                # calculate the average of the range between last nonzero value and this value
                avg_to_replace = (last_nonzero_element + elem) / (zero_ct + 2)
                # remove the last element of new_lst, and replace it with the average we calculated
                # also add all the elements we've skipped so far, as well as the current element
                new_lst = new_lst[:-1] + [avg_to_replace] * (zero_ct + 2)
            else:
                # just add this nonzero element to the list
                new_lst.append(elem)
            # since we hit a nonzero element, reset the zero count and last_nonzero_element
            zero_ct = 0
            last_nonzero_element = elem
    # append our newly-created smoothed list to the list of smoothed lists.
    smoothed_list.append(new_lst)

应用于示例中给出的my_list

my_list = [
           [0, 0, 1, 2, 3, 4, 0, 2, 0, 0], 
           [1, 3, 4, 5, 0, 3, 0, 0, 0]
          ]

这给出了以下内容:

[
 [0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 2, 3, 2.0, 2.0, 0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 
 [1, 3, 4, 2.6666666666666665, 2.6666666666666665, 0.75, 0.75, 0.75, 0.75]
]

您会注意到,在第二个示例中,当通过更早的替换来更改原始值时,该列表的值将使用列表的原始值来计算平均值(第四个-第二个列表中的to-last元素在另一个替换之后将是2.66,因此最后四个元素都将是0.66。相反,该程序的行为好像在计算时仍为3平均)。这是一个很难修复的错误,您必须决定自己喜欢哪种行为。

我将把“摆脱小数精度”作为练习供读者阅读。

答案 1 :(得分:2)

我想我会尽力而为。去吧:

def smooth(ls):
    left = 0
    for right, e in enumerate(ls):
        if e and ls[max(right - 1, 0)] != 0: # ignore consecutive nonzeros
            left = right
        if (e and left != right) or (not e and right == len(ls) - 1): # e is nonzero with zero(s) before it, or is last trailing zero
            avg = round((ls[left] + ls[right]) / (right - left + 1), 2) # 2 decimal places
            for ptr in range(left, right + 1): # flatten from 'left' to 'right', inclusive
                ls[ptr] = avg
            left = right # move up the left index to the last changed item
    return ls

本质上,函数的作用是使用两个索引变量leftright遍历列表一次。每次前面有一个非零且零的条目时,都会对位置leftright(包括两端)中的条目范围进行“展平”。然后,left指针将移动到最后一个更改的索引,并且该过程继续进行直到到达列表的末尾。

第一个输出与Green Cloak Guy程序的输出匹配。第二个略有不同,因为他使用的是列表中的原始值,而我的则没有。

>>> list(map(smooth, [[0, 0, 1, 2, 3, 4, 0, 2, 0, 0], [1, 3, 4, 5, 0, 3, 0, 0, 0]]))
[[0.33, 0.33, 0.33, 2, 3, 2.0, 2.0, 0.67, 0.67, 0.67], [1, 3, 4, 2.67, 2.67, 0.67, 0.67, 0.67, 0.67]]

在示例中,我将其四舍五入到两位小数,但是可以根据需要轻松更改。