Question

pandas中的issue #11675是否有解决方法？

我想迭代以下DataFrame，并且每行只调用一次应用函数：

import pandas
from pandas import Timestamp

test_data = {
    'input': {Timestamp('2015-05-01 12:30:00'): -1.,
              Timestamp('2015-05-01 12:30:01'): 0.,
              Timestamp('2015-05-01 12:30:02'): 1.,
              Timestamp('2015-05-01 12:30:03'): 0.,
              Timestamp('2015-05-01 12:30:04'): -1.
    }
}

def main():
    side_effects = {'state': 'B'}

    def disp(row):
        print('processing row:\n%s' % row)
        if side_effects['state'] == 'A':
            output = 1.
            if row['input'] == 1.:
                side_effects['state'] = 'B'

        else:
            output = -1.
            if row['input'] == -1.:
                side_effects['state'] = 'A'

        return pandas.Series({'input': row['input'], 'state': side_effects['state'], 'output': output})

    test_data_df = pandas.DataFrame(test_data)
    print(test_data_df.apply(disp, axis=1))

main()

目前，第一行使用以下版本的环境调用两次：

python: 3.4.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
pandas: 0.17.0

结果DataFrame看起来像：

                     input  output state
2015-05-01 12:30:00     -1       1     A
2015-05-01 12:30:01      0       1     A
2015-05-01 12:30:02      1       1     B
2015-05-01 12:30:03      0      -1     B
2015-05-01 12:30:04     -1      -1     A

请注意，令人惊讶的是，当我在test_data dict中将float的输入值更改为int时，我得到了预期的结果：

                     input  output state
2015-05-01 12:30:00     -1      -1     A
2015-05-01 12:30:01      0       1     A
2015-05-01 12:30:02      1       1     B
2015-05-01 12:30:03      0      -1     B
2015-05-01 12:30:04     -1      -1     A

我理解，正如大熊猫apply（）doc提到的那样，应该避免这种副作用。所以一般来说，我们如何使用DataFrame列作为输入运行状态机，因为apply（）正式不适合这项工作？

解决使用apply（）迭代DataFrame虚构行的pandas bug

0 个答案: