在这种情况下如何处理SettingWithCopyWarning

时间:2014-04-05 13:56:40

标签: python pandas resampling

我已经阅读了How to deal with this Pandas warning?中的答案,但我无法弄清楚是否应该忽略SettingWithCopyWarning警告,或者我是否做错了。

我有这个函数可以将某些数据重新采样到特定的时间范围(例如1小时),然后相应地填充NaN值。

def resample_data(raw_data, time_frame):
    # resamples the ticker data in ohlc
    ohlc_dict = {
        'open': 'first',
        'high': 'max',
        'low': 'min',
        'close': 'last',
        'price': 'mean'
    }

    volume_dict = {'volume': 'sum', 'volume_quote': 'sum'}

    resampled_data = raw_data.resample(time_frame, how={'price': ohlc_dict, 'amount': volume_dict})
    resampled_data['amount'] = resampled_data['amount']['volume'].fillna(0.0)
    resampled_data['amount']['volume_quote'] = resampled_data['amount']['volume']
    resampled_data['price']['close'] = resampled_data['price']['close'].fillna(method='pad')
    resampled_data['price']['open'] = resampled_data['price']['open'].fillna(resampled_data['price']['close'])
    resampled_data['price']['high'] = resampled_data['price']['high'].fillna(resampled_data['price']['close'])
    resampled_data['price']['low'] = resampled_data['price']['low'].fillna(resampled_data['price']['close'])
    resampled_data['price']['price'] = resampled_data['price']['price'].fillna(resampled_data['price']['close'])

    # ugly hack to remove multi index, must be better way
    output_data = resampled_data['price']
    output_data['volume'] = resampled_data['amount']['volume']
    output_data['volume_quote'] = resampled_data['amount']['volume_quote']

    return output_data

这是正确的做法吗?我应该忽略警告吗?

编辑:如果我尝试在警告中使用.loc作为sugested:

resampled_data = raw_data.resample(time_frame, how={'price': ohlc_dict, 'amount': volume_dict})
resampled_data.loc['amount'] = resampled_data['amount']['volume'].fillna(0.0)
resampled_data.loc['amount']['volume_quote'] = resampled_data['amount']['volume']
resampled_data.loc['price']['close'] = resampled_data['price']['close'].fillna(method='pad')
resampled_data.loc['price']['open'] = resampled_data['price']['open'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['high'] = resampled_data['price']['high'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['low'] = resampled_data['price']['low'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['price'] = resampled_data['price']['price'].fillna(resampled_data['price']['close'])

我收到以下错误,引用了行resampled_data.loc['price']['close'] = resampled_data['price']['close'].fillna(method='pad')

  

KeyError:'标签[price]不在[index]'

1 个答案:

答案 0 :(得分:2)

正如杰夫所指出的,由于这是一个MulitIndex列,你应该使用一个元组来访问它:

resampled_data['price']['close']

resampled_data[('price', 'close')]
resampled_data.loc[:, ('price', 'close')]  # equivalent

这也取消了它与列和行的区别:

resampled_data.loc['close', 'price']

(这是大熊猫在给出KeyError时试图做的事情。)

如果在代码中使用连续的[],通常会看到SettingWithCopy警告,并且最好合并为一个[],例如使用loc:

resampled_data.loc['price']['close'] = ... # this *may* set to a copy

如果您设置为副本(有时上面可能实际上不是副本,但是pandas在此处不保证),副本将正确更新,但随后立即进行垃圾回收。

除此之外:正如评论中提到的,resample提供了how='ohlc',因此您最好这样做,填充,填充,然后加入重新采样的卷。