df :
val wt
1 100 2
2 300 3
3 200 5
required df :
val wt cum_wt_avg
1 100 2 100
2 300 3 220
3 200 5 210
公式:
cum_wt_avg [i] = cum_sum(val * wt)[i] / cum_sum(weight)[i]
有没有简单的方法在熊猫或numpy中做到这一点? 像这样的东西
df["cum_wt_avg"] = pd.cum_mean(value=df.val, weight=df.wt)
答案 0 :(得分:0)
我认为在熊猫中最好避免循环。
首先按mul
分列多个列,得到cumsum
并除以cumsum
ed列wt
:
df["cum_wt_avg"] = df['val'].mul(df['wt']).cumsum().div(df['wt'].cumsum())
print (df)
val wt cum_wt_avg
1 100 2 100.0
2 300 3 220.0
3 200 5 210.0
要提高效果,请numpy
使用numpy.cumsum
:
import numpy as np
a = df['val'].values
b = df['wt'].values
df["cum_wt_avg"] = np.cumsum(a * b) / np.cumsum(b)
<强>计时强>:
import numpy as np
from numba import jit
df = pd.concat([df]*1000)
#jpp solution
@jit(nopython=True)
def cum_wavg(arr, res):
return np.cumsum(arr[:, 0] * arr[:, 1])/ np.cumsum(arr[:, 1])
def jez1(df):
a = df['val'].values
b = df['wt'].values
return np.cumsum(a * b) / np.cumsum(b)
print (jez1(df))
In [184]: %timeit cum_wavg(df.values, res=np.zeros(len(df.index)))
65.5 µs ± 27.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [185]: %timeit df['val'].mul(df['wt']).cumsum().div(df['wt'].cumsum())
362 µs ± 6.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [186]: %timeit (jez1(df))
63.8 µs ± 491 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
答案 1 :(得分:0)
这是使用numpy
的一种方法。
import numpy as np
def cum_wavg(arr):
return [np.average(arr[:i+1, 0], weights=arr[:i+1, 1]) for i in range(arr.shape[0])]
df['cum_wavg'] = cum_wavg(df.values)
为了获得更好的效果,您可以使用numba
:
import numpy as np
from numba import jit
df = pd.concat([df]*1000)
@jit(nopython=True)
def cum_wavg(arr, res):
return np.cumsum(arr[:, 0] * arr[:, 1])/ np.cumsum(arr[:, 1])
%timeit cum_wavg(df.values, res=np.zeros(len(df.index))) # 92.9 µs
%timeit df['val'].mul(df['wt']).cumsum().div(df['wt'].cumsum()) # 549 µs