我必须修复一些遗留代码,这些代码会更改每日样本数据,如下所示:
sample_data = [
{
'id': 10,
'name': 'example',
'tags': '["one", "two"]', # json encoded
'2016-12-20': 2,
'2016-12-21': 3,
'2016-12-22': 10,
'2016-12-23': 4,
'2016-12-24': 7,
'2016-12-25': 5,
'2016-12-26': 1,
'2016-12-27': 6,
'2016-12-28': 4,
'2016-12-29': 3,
'2016-12-30': 1,
},
{
'id': 11,
'name': None,
'tags': '["one"]', # json encoded
'2016-12-20': 6,
'2016-12-21': 10,
'2016-12-22': 190,
'2016-12-23': 77,
'2016-12-24': 35,
'2016-12-25': 346,
'2016-12-26': 6,
'2016-12-27': 9,
'2016-12-28': 8,
'2016-12-29': 3,
'2016-12-30': 0,
}
]
进入每周的手段。代码本身如下所示:
df = pd.DataFrame(data=sample_data)
df.set_index(['id', 'name', 'tags'], inplace=True)
df.columns = pd.to_datetime(df.columns)
df = df.replace(0, 1000)
df = df.T.resample('W')
df = df.mean()
df.index = df.index.strftime('%Y-%m-%d')
df = df.round()
df = df.fillna(method='ffill')
result = df.T.reset_index().to_dict(orient='records')
但是,我在执行期间遇到错误。代码处理大量数据(> 10k行),并且错误似乎偶尔发生。追溯如下:
File "[...]/api/helpers.py", line 277, in resample
df = df.mean()
File "[...]/lib/python2.7/site-packages/pandas/tseries/resample.py", line 540, in f
return self._downsample(_method)
File "[...]/lib/python2.7/site-packages/pandas/tseries/resample.py", line 693, in _downsample
self.grouper, axis=self.axis).aggregate(how, **kwargs)
File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3704, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3193, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
File "[...]/lib/python2.7/site-packages/pandas/core/base.py", line 432, in _aggregate
return getattr(self, arg)(*args, **kwargs), None
File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 1047, in median
return self._python_agg_general(f)
File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 818, in _python_agg_general
for name, obj in self._iterate_slices():
File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3123, in _iterate_slices
yield val, slicer(val)
File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3115, in <lambda>
slicer = lambda x: self.obj[x]
File "[...]/lib/python2.7/site-packages/pandas/core/frame.py", line 2057, in __getitem__
return self._getitem_multilevel(key)
File "[...]/lib/python2.7/site-packages/pandas/core/frame.py", line 2101, in _getitem_multilevel
loc = self.columns.get_loc(key)
File "[...]/lib/python2.7/site-packages/pandas/indexes/multi.py", line 1686, in get_loc
mask = self.labels[i][loc] == self.levels[i].get_loc(k)
File "[...]/lib/python2.7/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4145)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)
KeyError: <type 'object'>
无论我做什么,我似乎都无法解决它,而且我在熊猫方面经验不足。我还没注意到代码有什么问题吗?谢谢你的时间。