我目前有一个如下所示的数据框,每次超过1000 ex(2000,3000 ... etc)的倍数时,我都需要重置该cumsum,
Production ID cumsum
2017-10-19 1054 1323217 1054
2017-10-20 0 1323217 1054
2017-10-21 0 1323217 1054
2017-10-22 0 1323217 1054
2017-10-23 0 1323217 1054
例如,在上面,我需要一个如下所示的df:
Production ID cumsum adjCumsum numberGenerated
2017-10-19 1054 1323217 1054 1000 1
2017-10-20 0 1323217 1054 54 0
2017-10-21 0 1323217 1054 54 0
2017-10-22 3054 1323217 4108 4000 4
2017-10-23 0 1323217 4018 108 0
2017-10-23 500 1323218 500 500 0
下面的代码正确地每隔1000重置一次值,但我似乎不太明白如何通过ID对其进行分组并将其舍入为1000来进行转换。
maxvalue = 1000
lastvalue = 0
newcum = []
for row in df.iterrows():
thisvalue = row[1]['cumsum'] + lastvalue
if thisvalue > maxvalue:
thisvalue = 0
newcum.append( thisvalue )
lastvalue = thisvalue
df['newcum'] = newcum
由于下面的答案,我现在能够计算生成的累积数量,但是我需要计算生成的增量#。
df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000
multiple = (df['cumsum'] // thresh )
mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] %
thresh
df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()
My initial thinking was to try something similar to:
df['numGen1'] = df['cumsum2'].diff()
最终编辑经过测试,可以正常工作。多谢您的协助!
I was overthinking it, below is how I was able to do it:
df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000
multiple = (df['cumsum'] // thresh )
mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh
df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()
numgen = []
adjcumsum = []
for i in range(len(df['cumsum'])):
if df['cumsum'][i] > thresh and (df['ID'][i] == df['ID'][i-1]):
numgenv = (df['cumsum'][i] // thresh) - (df['cumsum'][i-1] // thresh)
numgen.append(numgenv)
elif df['cumsum'][i] > thresh:
numgenv = (df['cumsum'][i] // thresh)
numgen.append(numgenv)
else:
numgenv = 0
numgen.append(numgenv)
df['numgen2.0'] = numgen
答案 0 :(得分:2)
IIUC,这只是一个整数除法问题,具有一些技巧:
thresh = 1000
df['cumsum'] = df['Production'].cumsum()
# how many times cumsum passes thresh
multiple = (df['cumsum'] // thresh )
# detect where thresh is pass
mask = multiple.diff().ne(0)
# update the number generated:
df['numberGenerated'] = np.where(mask, multiple, 0)
# then the adjusted cumsum
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh
输出:
Production ID cumsum adjCumsum numberGenerated
2017-10-19 1054 1323217 1054 1054 1
2017-10-20 0 1323217 1054 54 0
2017-10-21 0 1323217 1054 54 0
2017-10-22 3054 1323217 4108 4108 4
2017-10-23 0 1323217 4108 108 0
2017-10-23 500 1323218 4608 608 0