计算大熊猫滚动窗口数据

时间:2018-01-31 08:40:54

标签: python pandas rolling-sum

我使用简单的自建功能在基于pandas滚动窗口的数据处理方面遇到了一些问题。我有三列值,并希望使用简单的列表推导来计算其中的一列以进行进一步处理。在我的例子中,我简单地总结了每个窗口只产生一个值的值。但似乎列表理解失败了...... import pandas as pd import numpy as np from collections import Counter as count

df = pd.DataFrame(np.random.randint(0,100,size=(50, 3)), columns=list('ABC'))

def my_test(data): Abs = [int(np.sqrt(x[0]**2+x[1]**2+x[2]**2)/10) for x in data] return sum(Abs)

entr = df.rolling(10).apply(my_test)

这是执行函数时得到的错误消息:

entr =  df.rolling(10).apply(my_test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 1207, in apply
    return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 856, in apply
    center=False)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 799, in _apply
    result = np.apply_along_axis(calc, self.axis, values)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\shape_base.py", line 116, in apply_along_axis
    res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 795, in calc
    closed=self.closed)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 853, in f
    offset, func, args, kwargs)
  File "pandas\_libs\window.pyx", line 1450, in pandas._libs.window.roll_generic (pandas\_libs\window.c:36061)
  File "<stdin>", line 2, in my_test
  File "<stdin>", line 2, in <listcomp>
IndexError: invalid index to scalar variable.

知道如何访问滚动数据吗?

1 个答案:

答案 0 :(得分:0)

试试这个。转换为一系列列表,然后应用此功能:

def my_test(r):
    return int(np.sqrt(sum(r**2)/10))

dfs = pd.Series(data=[df.loc[x].values for x in df.index], index=df.index)
dfs.apply(my_test).rolling(10).sum()