我正在使用pandas rolling-function来生成顺序数据。我的主窗口大小是51,我需要从这个初始窗口用不同的窗口计算各种度量,例如: 虚拟数据:
df = pd.DataFrame(np.random.randint(0,800,size=(1000, 3)), columns=list('ABC'))
我的功能:
def test(data):
meanMov = np.zeros((51,3))
mean = np.mean(data[0:31,:],axis=0)
for i in range(0,16):
meanMov[i] = mean
mean = np.mean(data[20:50,:], axis=0)
for i in range(35,51):
meanMov[i] = mean
for i in range(16,35):
meanMov[i] = np.mean(data[(i-15):(i+15+1)], axis=0)
return meanMov.mean()
运行该功能:
r = df.rolling(51)
entr = (r.apply(test)).dropna(axis=0, how='all')
当我运行该函数时,我收到以下错误:
>>> entr = (r.apply(test)).dropna(axis=0, how='all')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 1207, in apply
return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 856, in apply
center=False)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 799, in _apply
result = np.apply_along_axis(calc, self.axis, values)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\shape_base.py", line 116, in apply_along_axis
res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 795, in calc
closed=self.closed)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 853, in f
offset, func, args, kwargs)
File "pandas\_libs\window.pyx", line 1450, in pandas._libs.window.roll_generic (pandas\_libs\window.c:36061)
File "<stdin>", line 3, in test
IndexError: too many indices for array
如何计算所有列的不同均值并保存以供进一步处理...
非常感谢!
答案 0 :(得分:0)
这可能是您正在寻找的解决方案:
import pandas as pd
import numpy as np
# Create dummy data
df = pd.DataFrame(np.random.randint(0,800,size=(1000, 3)), columns=list('ABC'))
# To include this data into the dataframe with rolling means, start by creating a copy
df_complete = df.copy()
# Use the set of considered window sizes in this loop
for ws in [51, 45, 55]:
r = df.rolling(window=ws, center=False).mean()
# Give the following names to the columns with rolling windows: X_S,
# where X - name of data column and S - current window size
r.columns = ["%s_%d" % (c, ws) for c in r.columns]
# Add new columns to the aggregate dataframe (align using index)
df_complete = pd.concat([df_complete, r], axis=1)
print(df_complete.sample(5))
示例输出:
A B C A_51 B_51 C_51 A_45 \
584 169 624 332 407.372549 475.333333 355.784314 405.200000
863 477 726 218 444.980392 429.431373 458.901961 469.311111
994 162 161 301 407.843137 415.431373 396.117647 417.155556
873 600 82 413 445.137255 402.411765 471.490196 433.955556
6 381 274 681 NaN NaN NaN NaN
B_45 C_45 A_55 B_55 C_55
584 467.622222 350.755556 409.890909 462.800000 354.490909
863 448.777778 481.400000 449.418182 416.309091 448.563636
994 401.555556 400.688889 405.036364 406.309091 383.454545
873 392.822222 469.577778 454.945455 415.872727 474.327273
6 NaN NaN NaN NaN NaN
请记住,NaN
在每个列的开头显示滚动方式,其中行号小于相应的窗口大小(无法计算此类方法)。在创建NaN
数据帧后,可以解决此类df_complete
,例如df_complete.dropna()
。
关于您的代码(具体来说,test
函数),我想指出根据https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.window.Rolling.apply.html,指定的函数需要&#34;从中生成单个值当您尝试返回多列的方法时,ndarray输入&#34;在我看来,没有必要为像mean()这样常见的东西创建一个自定义函数。
我尝试使用评论中建议的rolling_mean()
函数:
r = pd.rolling_mean(df, window=51, center=False)
但这会产生警告,建议使用上述解决方案中的行:
pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(window=51,center=False).mean()
"""Entry point for launching an IPython kernel."
我希望您能找到有用的代码和注释。