大熊猫的有条件加法

时间:2018-09-22 09:31:13

标签: python pandas

我在熊猫中有以下数据框

   Code      Sum      Quantity
   0         -12      0
   1          23      0
   2         -10      0
   3         -12      0
   4         100      0
   5         102      201
   6          34      0
   7         -34      0
   8         -23      0
   9         100      0
   10        100      0
   11        102      300

我想要的数据框是

  Code      Sum      Quantity    new_sum
   0         -12      0          -12
   1          23      0           23
   2         -10      0          -10
   3         -12      0          -12
   4         100      0           0
   5         102      201         202 
   6          34      0           34
   7         -34      0          -34
   8         -23      0          -23
   9         100      0           0
   10        100      0           0
   11        102      300         302

逻辑是

首先,我将检查数量中的非零值,在上面的样本数据中,我们在index 4 which is 201处出现了数量的第一个非零值,然后我想添加总和列,直到在{{ 1}}

我写了一个使用if循环的代码,但是,我需要扫描超过一百万行,并且它没有给我想要的输出。

index 4

2 个答案:

答案 0 :(得分:1)

澄清后编辑的答案...

由于列表的理解和循环,这一步会慢一些。

设置:

import pandas as pd
import numpy as np

data = [[  0, -12,   0],
        [  1,  23,   0],
        [  2, -10,   0],
        [  3, -12,   0],
        [  4, 100,   0],
        [  5, 102, 201],
        [  6,  34,   0],
        [  7, -34,   0],
        [  8, -23,   0],
        [  9, 100,   0],
        [ 10, 100,   0],
        [ 11, 102, 300]]

df = pd.DataFrame(data, columns=['Code', 'Sum', 'Quantity'])

print(df)

    Code  Sum  Quantity
0      0  -12         0
1      1   23         0
2      2  -10         0
3      3  -12         0
4      4  100         0
5      5  102       201
6      6   34         0
7      7  -34         0
8      8  -23         0
9      9  100         0
10    10  100         0
11    11  102       300

代码:

# copy columns from input dataframe and invert
df1 = df[['Sum', 'Quantity']][::-1].copy()

# make an array to hold result column values
new_sum_array = np.zeros(len(df1)).astype(int)
df_sum = df1.Sum.values

# locate the indices of the pos values in "Quantity".
# these will be used for segmenting the "Sum" column values
posQ = (np.where(df1.Quantity > 0)[0])

# # don't want zero or last index value in posQ for splitting
if posQ[0] == 0:
    posQ = posQ[1:]
if posQ[-1] == len(df)-1:
    posQ = posQ[:-1]

# se (start-end)
# add first and last index to be used for indexing segments of df_sum
se = posQ.copy()
se = np.insert(se, 0, 0)
se = np.append(se, len(df))

starts = se[:-1]
ends = se[1:]

# keep only positive values from the df_sum array.
# this is used with numpy argmin to find first non-positive number
# within segments
only_pos = np.add(np.zeros(len(df)), np.where(df_sum > 0, df_sum, 0))

# split the only_positive array at Quantity locations
segs = np.array(np.split(only_pos, posQ))

# find the indices of the neg numbers within each segment
tgts = [np.argmin(x) for x in segs]

# use the indices to slice each segment and put the result into
# the result array
i = 0
for seg in segs:
    idx = np.arange(starts[i], ends[i])
    np.put(new_sum_array, idx[tgts[i]:], df_sum[idx][tgts[i]:])
    i += 1

# to add a lookback limit for adding consecutive positive df_sums,
# assign an integer value to max_lookback in next line.
# use "None" to ignore any limit
max_lookback = None
if max_lookback is not None:
    tgts = np.clip(tgts, 0, max_lookback)

# add up the values of the positive numbers in the sliced
# df_sum segments
sums = [np.sum(x[:l]) for x, l in zip(segs, tgts)]

# put those totals into the result array at positive "Quality" locations
np.put(new_sum_array, starts, sums)

# add the results to the df as "New Sum"
df1['New Sum'] = new_sum_array

# flip the dataframe back upright
df1 = df1[::-1]
# insert calculated column into original dataframe
df['new sum'] = df1['New Sum']

结果:

print(df)

    Code  Sum  Quantity  New Sum
0      0  -12         0      -12
1      1   23         0       23
2      2  -10         0      -10
3      3  -12         0      -12
4      4  100         0        0
5      5  102       201      202
6      6   34         0       34
7      7  -34         0      -34
8      8  -23         0      -23
9      9  100         0        0
10    10  100         0        0
11    11  102       300      302

答案 1 :(得分:0)

您可以按逆序数量的累积总和对数据框进行分组,计算所有正数和总和并分配回列

df['New sum'] = df[df.Sum.lt(0)]['Sum']
a = df.groupby([df.Quantity.ne(0)[::-1].cumsum()])['Sum'].apply(lambda x:x[x.ge(0)].sum())[::-1]
df['New sum'] = pd.Series(a.values,index=df[df.Quantity.ne(0)].index)

出局:

    Code    Sum Quantity    New sum
0   0   -12 0   NaN
1   1   -23 0   NaN
2   2   -12 0   NaN
3   3   100 0   NaN
4   4   102 201 202.0
5   5   -34 0   NaN
6   6   -23 0   NaN
7   7   100 0   NaN
8   8   100 0   NaN
9   9   102 300 302.0