Question

我不能很好地概括标题中的问题。我正在编写代码，并且在代码的一部分中，我需要计算以下内容：

假设我们有一个向量（例如numpy数组）：

a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]

我们想将大于5的任何数字转为5：

a = [3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]

然后，我们计算连续5s的总和及其后的数字，并将所有这些元素替换为结果总和：

a = [3.2, 4, 5+ 2, 5+ 5+ 5+ 1.7, 2, 5+ 5+ 1, 3]

所以结果数组将是：

a = [3.2, 4, 7, 16.7, 2, 11, 3]

我可以使用这样的for循环来做到这一点：

    indx = np.where(a>5)[0]
    a[indx] = 5
    counter = 0
    c = []
    while (counter < len(a)):
        elem = a[counter]
        if elem ~= 5:
            c.append(elem)
        else:
            temp = 0
            while(elem==5):
                temp += elem
                counter +=1
                elem = a[counter]
            temp += elem
            c.append(temp)
        counter += 1

有没有一种方法可以避免使用for循环？也许通过使用indx变量？

如果将其转换为字符串，我有一个模糊的主意： a ='[3.2，4，5，2，5，5，5，1.7，2，5，5，1，3]' 然后将' 5,'更改为' 5+'，然后使用eval(a)。但是，是否有一种有效的方法来查找包含子字符串的所有索引？字符串是不可变的事实呢？

Answer 1

这就是您想要的（全部在矢量化numpy中）：

import numpy as np

a = np.array([0, 3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3, 0]) # add a 0 at the beginning and the end
aa = np.where(a>5, 5, a) # clip values to 5, can use np.clip(a, None, 5) too...
c = np.cumsum(aa) # get cumulative sum
np.diff(c[aa < 5]) # only keep values where original array is less than 5, then diff again

array([ 3.2,  4. ,  7. , 16.7,  2. , 11. ,  3. ])

Answer 2

您可以使用pandas进行数据处理，使用cumsum和shift来将groupby的值与逻辑进行合并，并与sum进行汇总

df = pd.DataFrame(a, columns=['col1'])
df.loc[df.col1 > 5] = 5
s = df.col1.groupby((df.col1 != 5).cumsum().shift().fillna(0)).sum()

col1
0.0     3.2
1.0     4.0
2.0     7.0
3.0    16.7
4.0     2.0
5.0    11.0
6.0     3.0

要获得麻木感，只需获得.values

>>> s.values
array([  3.2,   4. ,   7. ,  16.7,   2. ,  11. ,   3. ])

Answer 3

我认为您可以一次完成此操作。对于每个项目：

如果该值为5或更大，请不要立即将其附加到列表中，暂时“推迟” 5。
如果该值小于5，则将其添加到所有“延期”的5中，然后将总和追加

。

a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]
result = []

current_sum = 0
for item in a:
    if item < 5:
        result.append(current_sum + item)
        current_sum = 0
    else:
        current_sum += 5

if current_sum:
    result.append(current_sum)

>>> result
[3.2, 4, 7, 16.7, 2, 11, 3]

计算向量中大于常数的连续值之和？

3 个答案: