假设我有一个如下所示的多索引数据框。
ROW_ID HADM_ID ICUSTAY_ID
SUBJECT_ID CHARTTIME
23 2157-10-21 12:05:00 1 124321 234044.0
2157-10-21 14:00:00 30 124321 234044.0
2157-10-21 19:00:00 77 124321 234044.0
2157-10-22 00:00:00 148 124321 234044.0
2157-10-22 04:00:00 197 124321 234044.0
2157-10-22 08:00:00 226 124321 234044.0
2157-10-22 16:00:00 320 124321 234044.0
34 2191-02-23 08:00:00 367 144319 290505.0
2191-02-23 12:00:00 450 144319 290505.0
2191-02-23 15:00:00 476 144319 290505.0
2191-02-23 20:00:00 511 144319 290505.0
2191-02-24 00:00:00 538 144319 290505.0
2191-02-24 04:00:00 567 144319 290505.0
2191-02-24 07:00:00 608 144319 290505.0
2191-02-24 12:00:00 648 144319 290505.0
36 2134-05-12 07:00:00 685 165660 241249.0
2134-05-12 12:00:00 787 165660 241249.0
2134-05-12 16:00:00 855 165660 241249.0
2134-05-12 20:00:00 924 165660 241249.0
2134-05-13 00:00:00 988 165660 241249.0
SUBJECT_ID和CHARTTIME是多索引。现在我想获取每个第一个CHARTTIME的每个SUBJECT_ID的行,因此预期的输出是:
ROW_ID HADM_ID ICUSTAY_ID
SUBJECT_ID CHARTTIME
23 2157-10-21 12:05:00 1 124321 234044.0
34 2191-02-23 08:00:00 367 144319 290505.0
36 2134-05-12 07:00:00 685 165660 241249.0
我尝试使用iloc和xs,但是它不起作用。任何帮助将不胜感激。
答案 0 :(得分:1)
如果要按索引分组,则必须传递级别参数,而不是按参数。
df = df.reset_index('CHARTTIME')
df = df.groupby(level=['SUBJECT_ID']).first().set_index('CHARTTIME', append=True)