使用MultiIndex时,在Pandas中按年份进行索引/选择

时间:2015-02-24 23:14:04

标签: python datetime indexing pandas dataframe

长期潜伏者试图在这里更积极地参与社区活动。

这里是:

当我将数据帧的索引设置为单级DatetimeIndex时,一切都按预期工作。当我按下面的索引时,我得到以下回报:

输入:

sample_Email_click2 = sample_Email_click.reset_index(drop=True)
sample_Email_click2.set_index('Date_Comb', inplace=True, drop=False)
sample_Email_click2

输出:

               Date_Comb              Campaign_CD

Date_Comb 2012-10-11 05:18:05 2012-10-11 05:18:05 2842

             2012-09-27 15:04:12    2012-09-27 15:04:12 2805

以我期望的格式产生数据帧......

sample_Email_click2.ix["2012"].head(2).ix[:,:2].index

<class 'pandas.tseries.index.DatetimeIndex'>
enter code here`[2012-10-11 05:18:05, 2012-09-27 15:04:12]
Length: 2, Freq: None, Timezone: None`

然而,当我尝试引入MultiIndex并使用.ix和.xs方法时,事情似乎出了问题。

print sample_Email_click2.index.levels[0]
print sample_Email_click2.index.levels[1]

输出:

<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-03 01:25:55, ..., 2014-07-31 23:49:53]
Length: 562933, Freq: None, Timezone: None

Index([u'10003341', u'10006447', u'10009143', u'10009338', u'10010524', u'10018143', u'10020892', u'10038064', u'10048634', u'10055227', u'10056611', u'10057317', u'10063204', u'10070907', u'10072846', u'10078439', u'10078571', u'10079289', u'10086611', u'10097987', u'10101451', u'10102952', u'10110609', u'10118756', u'10127396', u'10130253', u'10144913', u'10149927', u'10157760', u'10172201', u'10173908', u'10179552', u'10184175', u'10189897', u'10190549', u'10191493', u'10197319', u'10210260', u'10217133', u'10217986', u'10218016', u'10232329', u'10240641', u'10244519', u'10244922', u'10248604', u'10248909', u'10257722', u'10271347', u'10282045', u'10284030', u'10289296', u'10295972', u'10305340', u'10326711', u'10332290', u'10343444', u'10343976', u'10357733', u'10360593', u'10366234', u'10376405', u'10376704', u'10379122', u'10386833', u'10391989', u'10398309', u'10405057', u'10406239', u'10411259', u'10414566', u'10418969', u'10419291', u'10424138', u'10435470', u'10440233', u'10448304', u'10466519', u'10470115', u'10476204', u'10480159', u'10481054', u'10484122', u'10487410', u'10487540', u'10490374', u'10495338', u'105004023', u'105004035', u'105004071', u'105004316', u'105013017', u'105014110', u'105014144', u'105016206', u'105017011', u'105017739', u'105018514', u'105018717', u'105018810', ...], dtype='object')

作为参考,我的索引名称是:

sample_Email_click2.index.names
FrozenList([u'Date_Comb', u'MATCHED_CUSTOMER_ID'])

所以我希望以下任何一项都能/可行:

sample_Email_click2.ix["2012"]
sample_Email_click2.ix["2012", :]
sample_Email_click2.loc["2012"]
sample_Email_click2.xs("2012", level="Date_Comb")
sample_Email_click2.xs("2012", level=0)

为了确保我不是完全疯了,这确实有效:

sample_Email_click2.xs("10240641", level=1).ix["2012"]

                       Date_Comb              Campaign_CD
Date_Comb       
2012-08-09 17:01:35 2012-08-09 17:01:35       2648
2012-12-12 17:44:54 2012-12-12 17:44:54       3056

有人可以帮我弄清楚如何在MultiIndex中使用DatetimeIndex,而不必查询精确的时间戳(因为我怀疑pd.to_datetime(2012,08,09等)可以工作...

提前非常感谢你。

1 个答案:

答案 0 :(得分:1)

在使用MultIndex时,我认为这种形式的“模糊索引”是不可能的(但是?)。但是,作为一种变通方法,您可以从多索引中提取单级索引,并使用其get_loc方法查找与2012年所需日期对应的整数索引:

例如,

index = pd.date_range('2012-3-3', '2014-07-31', freq='M')
N = len(index)
df = pd.DataFrame({'id': range(N), 'CD': np.random.randint(10, N)}, index=index)
df = df.set_index(['id'], append=True)
dtindex = df.index.get_level_values(0)
result = df.iloc[dtindex.get_loc('2012')]
print(result)

产量

               CD
           id    
2012-03-31 0   22
2012-04-30 1   22
2012-05-31 2   22
2012-06-30 3   22
2012-07-31 4   22
2012-08-31 5   22
2012-09-30 6   22
2012-10-31 7   22
2012-11-30 8   22
2012-12-31 9   22

PS:pd.IndexSlice effectively replaces xs。虽然它们不是解决方案,但您可能会发现multiIndexing using slicers对您有用。