熊猫数据框滚动,自定义计算,df.iloc ValueError

时间:2020-03-01 20:53:48

标签: pandas valueerror rolling-computation custom-function

我应该如何在custom_function中利用pandas.DataFrame.rolling提供的RangeIndex?

当前实现给出ValueError。

首先x.index = RangeIndex(start = 0,stop = 2,step = 1),并且tmp_df正确选择df中的第一行和第二行(索引0和1)。对于最后一个x.index = RangeIndex(start = 6,stop = 8,step = 1),似乎iloc试图在df中选择超出范围的索引8(df的索引为0到7)。

基本上,我想做的是具有自定义功能来对窗口中的连续数字进行计数。给定窗口中的正值1,0,1,1,1,0,自定义函数应返回3,因为最多有3个连续的1。

import numpy as np
import pandas as pd

df = pd.DataFrame({'open': [7, 5, 10, 11,6,13,17,12],
                   'close': [6, 6, 11, 10,7,15,18,10],
                   'positive': [0, 1, 1, 0,1,1,1,0]},
                 )

def custom_function(x,df):
    print("index:",x.index)
    tmp_df = df.iloc[x.index] # raises "ValueError: cannot set using a slice indexer with a different length than the value" when x.index = RangeIndex(start=6, stop=8, step=1) as df index goes from 0 to 7 only

    # do calulations on any column in tmp_df, get result
    result = 1 #dummyresult

    return result

intervals = range(2, 10)
for i in intervals:
    df['result_' + str(i)] = np.nan
    res = df.rolling(i).apply(custom_function, args=(df,), raw=False)
    df['result_' + str(i)][1:] = res

print(df)

0 个答案:

没有答案