我有从MongoDB读取的list
数据。可以在this gist中找到数据的子集。我正在从此列表创建一个DataFrame,使用Date字段创建DatetimeIndex。这些日期最初记录在我当地的时区,但是在Mongo中他们没有附上时区信息,因此我按照建议here更正了DST。
from datetime import datetime
from dateutil import tz
# data is the list from the gist
dates = [x['Date'] for x in data]
idx = pd.DatetimeIndex(dates, freq='D')
idx = idx.tz_localize(tz=tz.tzutc())
idx = idx.tz_convert(tz='Europe/Dublin')
idx = idx.normalize()
frame = DataFrame(data, index=idx)
frame = frame.drop('Date', 1)
一切似乎都很好,我的框架看起来像这样
Events ID
2008-03-31 00:00:00+01:00 0.0 116927302
2008-03-30 00:00:00+00:00 2401.0 116927302
2008-03-31 00:00:00+01:00 0.0 116927307
2008-03-30 00:00:00+00:00 0.0 116927307
2008-03-31 00:00:00+01:00 0.0 121126919
2008-03-30 00:00:00+00:00 1019.0 121126919
2008-03-30 00:00:00+00:00 0.0 121126922
2008-03-31 00:00:00+01:00 0.0 121126922
2008-03-30 00:00:00+00:00 0.0 121127133
2008-03-31 00:00:00+01:00 0.0 121127133
2008-03-31 00:00:00+01:00 0.0 131677370
2008-03-30 00:00:00+00:00 0.0 131677370
2008-03-30 00:00:00+00:00 0.0 131677416
2008-03-31 00:00:00+01:00 0.0 131677416
现在,我想使用原始的DatetimeIndex和ID列来创建MultiIndex,如图所示here。 但是,当我尝试这个时,我得到一个在最初创建DatetimeIndex时没有引发的错误
frame.set_index([frame.ID, idx])
NonExistentTimeError:2008-03-30 01:00:00
如果我在没有MultiIndex的情况下执行frame.set_index(idx)
,则不会引发错误
版本
答案 0 :(得分:1)
首先需要sort_index
,然后将ID
列添加到index
:
frame = frame.sort_index()
frame.set_index('ID', append=True, inplace=True)
print (frame)
Events
ID
2008-03-30 00:00:00+00:00 168445814 0.0
168445633 0.0
168445653 0.0
245514429 0.0
168445739 0.0
168445810 0.0
332955940 0.0
168445875 0.0
168445628 0.0
217596128 1779.0
177336685 0.0
180799848 0.0
215797757 0.0
180800351 1657.0
183192871 0.0
...
...
如果需要另一个级别排序,请使用DataFrame.swaplevel
:
frame = frame.sort_index()
frame.set_index('ID', append=True, inplace=True)
frame = frame.swaplevel(0,1)
print (frame)
Events
ID
168445814 2008-03-30 00:00:00+00:00 0.0
168445633 2008-03-30 00:00:00+00:00 0.0
168445653 2008-03-30 00:00:00+00:00 0.0
245514429 2008-03-30 00:00:00+00:00 0.0
168445739 2008-03-30 00:00:00+00:00 0.0
168445810 2008-03-30 00:00:00+00:00 0.0
332955940 2008-03-30 00:00:00+00:00 0.0
168445875 2008-03-30 00:00:00+00:00 0.0
168445628 2008-03-30 00:00:00+00:00 0.0
217596128 2008-03-30 00:00:00+00:00 1779.0
177336685 2008-03-30 00:00:00+00:00 0.0
180799848 2008-03-30 00:00:00+00:00 0.0
215797757 2008-03-30 00:00:00+00:00 0.0
180800351 2008-03-30 00:00:00+00:00 1657.0
183192871 2008-03-30 00:00:00+00:00 0.0
186439064 2008-03-30 00:00:00+00:00 0.0
199856024 2008-03-30 00:00:00+00:00 0.0
...
...
如果需要将列复制到index
,请使用set_index(frame.ID, ...
:
frame = frame.sort_index()
frame.set_index(frame.ID, append=True, inplace=True)
frame = frame.swaplevel(0,1)
print (frame)
Events ID
ID
168445814 2008-03-30 00:00:00+00:00 0.0 168445814
168445633 2008-03-30 00:00:00+00:00 0.0 168445633
168445653 2008-03-30 00:00:00+00:00 0.0 168445653
245514429 2008-03-30 00:00:00+00:00 0.0 245514429
168445739 2008-03-30 00:00:00+00:00 0.0 168445739
168445810 2008-03-30 00:00:00+00:00 0.0 168445810
332955940 2008-03-30 00:00:00+00:00 0.0 332955940
168445875 2008-03-30 00:00:00+00:00 0.0 168445875
168445628 2008-03-30 00:00:00+00:00 0.0 168445628
217596128 2008-03-30 00:00:00+00:00 1779.0 217596128
177336685 2008-03-30 00:00:00+00:00 0.0 177336685
180799848 2008-03-30 00:00:00+00:00 0.0 180799848
215797757 2008-03-30 00:00:00+00:00 0.0 215797757
180800351 2008-03-30 00:00:00+00:00 1657.0 180800351
183192871 2008-03-30 00:00:00+00:00 0.0 183192871
186439064 2008-03-30 00:00:00+00:00 0.0 186439064
...
...