是否有一些内置聚合到Pandas(或NumPy?)我可以用来优化下面标有***
的行?
>>> import numpy as np
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'A':[1,21,4,5,3,3,5,653,2], 'B':[1,2,3,4,5,6,7,8,9]})
>>> steps = 3
>>>
>>> values = df.iloc[:,0]
>>> current = values[-steps:]
>>> old = values[:-steps]
*** >>> mean = np.array([old[i::steps].mean() for i in range(steps)]) ***
>>> df.iloc[-steps:,0] = current - mean
>>> df1 = df.iloc[-steps:]
>>> df1
A B
6 2.0 7
7 641.0 8
8 -1.5 9
答案 0 :(得分:3)
我们可以用矢量化方式计算mean
,由于for循环似乎是瓶颈,就像这样 -
mean = old.values.reshape(-1,steps).mean(axis=0)
对于数组大小可能无法被steps
整除的情况,我们可以使用np.bincount
-
ids = np.arange(a.size)%steps
mean= np.bincount(ids, a)/np.bincount(ids)