熊猫重新取样+总和聚合:如何忽略nans?

时间:2018-01-22 14:10:44

标签: python pandas

我正在尝试生成带有价格和金额列表的15Min OHLCV数据,并附有示例数据:

                           price    amount
unix_timestamp                            
2018-01-05 12:33:52  15861.00000  0.194755
2018-01-05 12:33:52  15860.00000  0.050000
2018-01-05 12:33:53  15860.00000  0.100000
2018-01-05 12:33:53  15860.00000  0.234208
2018-01-05 12:33:54  15860.00000  0.021911
2018-01-05 12:33:56  15861.00000  0.205245
...

以下是如何使用ffill生成OHLCV数据来填充缺失的数据:

ohlcv = data.resample(minutes).agg({
                "price":"ohlc",
                "amount": "sum",
            }).rename(columns={'amount':'volume'}).ffill()

然而,结果包含音量为' 0'在计算缺失数据的总和而不是向前填充时:

                        open     high      low    close      volume
unix_timestamp                                                     
2018-01-05 12:30:00  15861.0  15946.0  15860.0  15891.0  246.554694
2018-01-05 12:45:00  15893.0  15912.0  15780.0  15877.0  608.036132
2018-01-05 13:00:00  15877.0  15950.0  15862.0  15950.0  303.742717
2018-01-05 13:15:00  15947.0  15956.0  15900.0  15939.0  347.864213
2018-01-05 13:30:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-05 13:45:00  15947.0  15956.0  15900.0  15939.0    0.000000
...
2018-01-22 10:45:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:00:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:15:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:30:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:45:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 12:00:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 12:15:00  11327.0  11327.0  11250.0  11250.0  193.271647

当总和为NaN时,如何进行前向填充而不是填充零?

1 个答案:

答案 0 :(得分:0)

sum的问题0函数返回NaN

解决方案是通过mask替换它们,然后应用函数ffill

print (data)
                       price    amount
unix_timestamp                        
2018-01-05 12:33:52  15861.0  0.194755
2018-01-05 12:33:52  15860.0  0.050000
2018-01-05 12:33:53  15860.0  0.100000
2018-01-05 13:33:53  15860.0  0.234208
2018-01-05 14:33:54  15860.0  0.021911
2018-01-05 16:33:56  15861.0  0.205245

ohlcv = data.resample('15min').agg({
                "price":"ohlc",
                "amount": "sum",
            }).rename(columns={'amount':'volume'})

m = ohlcv.loc[:, ('price','open')].isnull()
ohlcv.loc[:, ('volume','volume')] = ohlcv.loc[:, ('volume','volume')].mask(m)

ohlcv = ohlcv.ffill()
print (ohlcv)
                       price                               volume
                        open     high      low    close    volume
unix_timestamp                                                   
2018-01-05 12:30:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 12:45:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:00:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:15:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:30:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 13:45:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:00:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:15:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:30:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 14:45:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:00:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:15:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:30:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:45:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:00:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:15:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:30:00  15861.0  15861.0  15861.0  15861.0  0.205245