我已经阅读了How to deal with this Pandas warning?中的答案,但我无法弄清楚是否应该忽略SettingWithCopyWarning警告,或者我是否做错了。
我有这个函数可以将某些数据重新采样到特定的时间范围(例如1小时),然后相应地填充NaN值。
def resample_data(raw_data, time_frame):
# resamples the ticker data in ohlc
ohlc_dict = {
'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'price': 'mean'
}
volume_dict = {'volume': 'sum', 'volume_quote': 'sum'}
resampled_data = raw_data.resample(time_frame, how={'price': ohlc_dict, 'amount': volume_dict})
resampled_data['amount'] = resampled_data['amount']['volume'].fillna(0.0)
resampled_data['amount']['volume_quote'] = resampled_data['amount']['volume']
resampled_data['price']['close'] = resampled_data['price']['close'].fillna(method='pad')
resampled_data['price']['open'] = resampled_data['price']['open'].fillna(resampled_data['price']['close'])
resampled_data['price']['high'] = resampled_data['price']['high'].fillna(resampled_data['price']['close'])
resampled_data['price']['low'] = resampled_data['price']['low'].fillna(resampled_data['price']['close'])
resampled_data['price']['price'] = resampled_data['price']['price'].fillna(resampled_data['price']['close'])
# ugly hack to remove multi index, must be better way
output_data = resampled_data['price']
output_data['volume'] = resampled_data['amount']['volume']
output_data['volume_quote'] = resampled_data['amount']['volume_quote']
return output_data
这是正确的做法吗?我应该忽略警告吗?
编辑:如果我尝试在警告中使用.loc作为sugested:
resampled_data = raw_data.resample(time_frame, how={'price': ohlc_dict, 'amount': volume_dict})
resampled_data.loc['amount'] = resampled_data['amount']['volume'].fillna(0.0)
resampled_data.loc['amount']['volume_quote'] = resampled_data['amount']['volume']
resampled_data.loc['price']['close'] = resampled_data['price']['close'].fillna(method='pad')
resampled_data.loc['price']['open'] = resampled_data['price']['open'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['high'] = resampled_data['price']['high'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['low'] = resampled_data['price']['low'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['price'] = resampled_data['price']['price'].fillna(resampled_data['price']['close'])
我收到以下错误,引用了行resampled_data.loc['price']['close'] = resampled_data['price']['close'].fillna(method='pad')
KeyError:'标签[price]不在[index]'
中
答案 0 :(得分:2)
正如杰夫所指出的,由于这是一个MulitIndex列,你应该使用一个元组来访问它:
resampled_data['price']['close']
resampled_data[('price', 'close')]
resampled_data.loc[:, ('price', 'close')] # equivalent
这也取消了它与列和行的区别:
resampled_data.loc['close', 'price']
(这是大熊猫在给出KeyError时试图做的事情。)
如果在代码中使用连续的[],通常会看到SettingWithCopy警告,并且最好合并为一个[],例如使用loc:
resampled_data.loc['price']['close'] = ... # this *may* set to a copy
如果您设置为副本(有时上面可能实际上不是副本,但是pandas在此处不保证),副本将正确更新,但随后立即进行垃圾回收。
除此之外:正如评论中提到的,resample提供了how='ohlc'
,因此您最好这样做,填充,填充,然后加入重新采样的卷。