我该如何转换:
patient_id test test_value date_taken
11964 HBA1C 8.60 2017-06-14
11964 Glucose 231.00 2017-05-01
11964 Glucose 202.00 2017-07-01
11964 Glucose 194.00 2017-09-02
11964 Creatinine 1.10 2017-05-01
11964 Creatinine 1.28 2017-08-14
对此吗?
patient_id hba1c_earliest hba1c_latest hba1c_change glucose_earliest glucose_latest/
11964 8.60 8.60 0.0000 231.0 194.0
glucose_change creatinine_earliest creatinine_latest creatinine_change
-0.1602 1.10 1.28 0.1636
对于扩展数据框:
.*_earliest columns should include that lab result with the earliest date. .*_latest columns should include that lab result with the latest date. .*_change columns should hold the relative change (variation), (Latest - Earliest) / Earliest.
答案 0 :(得分:1)
使用:
print (df.dtypes)
patient_id int64 <- not necessary
test object <- not necessary
test_value float64 <- necessary
date_taken datetime64[ns] <- necessary
dtype: object
df = (df.sort_values(['patient_id','test','date_taken'])
.groupby(['patient_id','test'])['test_value']
.agg([('earliest','first'),('latest','last')])
.assign(change = lambda x: (x['latest'] - x['earliest'])/ x['earliest'])
.unstack()
.swaplevel(0,1, axis=1)
.reindex(columns=df['test'].unique(), level=0)
)
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
patient_id HBA1C_earliest HBA1C_latest HBA1C_change Glucose_earliest \
0 11964 8.6 8.6 0.0 231.0
Glucose_latest Glucose_change Creatinine_earliest Creatinine_latest \
0 194.0 -0.160173 1.1 1.28
Creatinine_change
0 0.163636
说明:
sort_values
多列agg
与earliest
和latest
列的first
和last
进行汇总。assign
创建新列unstack
进行修复swaplevel
的列交换MulriIndex中的级别reindex
的顺序进行排序,就像原始列中一样map
与join
一起平铺MultiIndex index
中列的最后reset_index
。