我在熊猫中有以下数据框
Code Sum Quantity
0 -12 0
1 23 0
2 -10 0
3 -12 0
4 100 0
5 102 201
6 34 0
7 -34 0
8 -23 0
9 100 0
10 100 0
11 102 300
我想要的数据框是
Code Sum Quantity new_sum
0 -12 0 -12
1 23 0 23
2 -10 0 -10
3 -12 0 -12
4 100 0 0
5 102 201 202
6 34 0 34
7 -34 0 -34
8 -23 0 -23
9 100 0 0
10 100 0 0
11 102 300 302
逻辑是
首先,我将检查数量中的非零值,在上面的样本数据中,我们在index 4 which is 201
处出现了数量的第一个非零值,然后我想添加总和列,直到在{{ 1}}
我写了一个使用if循环的代码,但是,我需要扫描超过一百万行,并且它没有给我想要的输出。
index 4
答案 0 :(得分:1)
澄清后编辑的答案...
由于列表的理解和循环,这一步会慢一些。
设置:
import pandas as pd
import numpy as np
data = [[ 0, -12, 0],
[ 1, 23, 0],
[ 2, -10, 0],
[ 3, -12, 0],
[ 4, 100, 0],
[ 5, 102, 201],
[ 6, 34, 0],
[ 7, -34, 0],
[ 8, -23, 0],
[ 9, 100, 0],
[ 10, 100, 0],
[ 11, 102, 300]]
df = pd.DataFrame(data, columns=['Code', 'Sum', 'Quantity'])
print(df)
Code Sum Quantity
0 0 -12 0
1 1 23 0
2 2 -10 0
3 3 -12 0
4 4 100 0
5 5 102 201
6 6 34 0
7 7 -34 0
8 8 -23 0
9 9 100 0
10 10 100 0
11 11 102 300
代码:
# copy columns from input dataframe and invert
df1 = df[['Sum', 'Quantity']][::-1].copy()
# make an array to hold result column values
new_sum_array = np.zeros(len(df1)).astype(int)
df_sum = df1.Sum.values
# locate the indices of the pos values in "Quantity".
# these will be used for segmenting the "Sum" column values
posQ = (np.where(df1.Quantity > 0)[0])
# # don't want zero or last index value in posQ for splitting
if posQ[0] == 0:
posQ = posQ[1:]
if posQ[-1] == len(df)-1:
posQ = posQ[:-1]
# se (start-end)
# add first and last index to be used for indexing segments of df_sum
se = posQ.copy()
se = np.insert(se, 0, 0)
se = np.append(se, len(df))
starts = se[:-1]
ends = se[1:]
# keep only positive values from the df_sum array.
# this is used with numpy argmin to find first non-positive number
# within segments
only_pos = np.add(np.zeros(len(df)), np.where(df_sum > 0, df_sum, 0))
# split the only_positive array at Quantity locations
segs = np.array(np.split(only_pos, posQ))
# find the indices of the neg numbers within each segment
tgts = [np.argmin(x) for x in segs]
# use the indices to slice each segment and put the result into
# the result array
i = 0
for seg in segs:
idx = np.arange(starts[i], ends[i])
np.put(new_sum_array, idx[tgts[i]:], df_sum[idx][tgts[i]:])
i += 1
# to add a lookback limit for adding consecutive positive df_sums,
# assign an integer value to max_lookback in next line.
# use "None" to ignore any limit
max_lookback = None
if max_lookback is not None:
tgts = np.clip(tgts, 0, max_lookback)
# add up the values of the positive numbers in the sliced
# df_sum segments
sums = [np.sum(x[:l]) for x, l in zip(segs, tgts)]
# put those totals into the result array at positive "Quality" locations
np.put(new_sum_array, starts, sums)
# add the results to the df as "New Sum"
df1['New Sum'] = new_sum_array
# flip the dataframe back upright
df1 = df1[::-1]
# insert calculated column into original dataframe
df['new sum'] = df1['New Sum']
结果:
print(df)
Code Sum Quantity New Sum
0 0 -12 0 -12
1 1 23 0 23
2 2 -10 0 -10
3 3 -12 0 -12
4 4 100 0 0
5 5 102 201 202
6 6 34 0 34
7 7 -34 0 -34
8 8 -23 0 -23
9 9 100 0 0
10 10 100 0 0
11 11 102 300 302
答案 1 :(得分:0)
您可以按逆序数量的累积总和对数据框进行分组,计算所有正数和总和并分配回列
df['New sum'] = df[df.Sum.lt(0)]['Sum']
a = df.groupby([df.Quantity.ne(0)[::-1].cumsum()])['Sum'].apply(lambda x:x[x.ge(0)].sum())[::-1]
df['New sum'] = pd.Series(a.values,index=df[df.Quantity.ne(0)].index)
出局:
Code Sum Quantity New sum
0 0 -12 0 NaN
1 1 -23 0 NaN
2 2 -12 0 NaN
3 3 100 0 NaN
4 4 102 201 202.0
5 5 -34 0 NaN
6 6 -23 0 NaN
7 7 100 0 NaN
8 8 100 0 NaN
9 9 102 300 302.0