如何根据某些条件投射(长到宽)数据框

时间:2018-11-06 07:26:04

标签: python pandas dataframe casting

我该如何转换:

patient_id test    test_value      date_taken
11964    HBA1C         8.60        2017-06-14
11964    Glucose     231.00        2017-05-01
11964    Glucose     202.00        2017-07-01
11964    Glucose     194.00        2017-09-02
11964    Creatinine    1.10        2017-05-01
11964    Creatinine    1.28        2017-08-14

对此吗?

patient_id  hba1c_earliest hba1c_latest hba1c_change glucose_earliest glucose_latest/
    11964      8.60           8.60          0.0000        231.0           194.0   
glucose_change creatinine_earliest creatinine_latest creatinine_change
     -0.1602         1.10               1.28             0.1636

对于扩展数据框:

  

.*_earliest columns should include that lab result with the earliest date. .*_latest columns should include that lab result with the latest date. .*_change columns should hold the relative change (variation), (Latest - Earliest) / Earliest.

1 个答案:

答案 0 :(得分:1)

使用:

print (df.dtypes)
patient_id             int64 <- not necessary
test                  object <- not necessary
test_value           float64 <- necessary
date_taken    datetime64[ns] <- necessary
dtype: object

df = (df.sort_values(['patient_id','test','date_taken'])
       .groupby(['patient_id','test'])['test_value']
       .agg([('earliest','first'),('latest','last')])
       .assign(change = lambda x: (x['latest'] - x['earliest'])/ x['earliest'])
       .unstack()
       .swaplevel(0,1, axis=1)
       .reindex(columns=df['test'].unique(), level=0)
       )
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
   patient_id  HBA1C_earliest  HBA1C_latest  HBA1C_change  Glucose_earliest  \
0       11964             8.6           8.6           0.0             231.0   

   Glucose_latest  Glucose_change  Creatinine_earliest  Creatinine_latest  \
0           194.0       -0.160173                  1.1               1.28   

   Creatinine_change  
0           0.163636  

说明

  1. 第一列sort_values多列
  2. 通过aggearliestlatest列的firstlast进行汇总。
  3. 通过assign创建新列
  4. 通过unstack进行修复
  5. swaplevel的列交换MulriIndex中的级别
  6. 然后按reindex的顺序进行排序,就像原始列中一样
  7. 在列中将mapjoin一起平铺MultiIndex
  8. index中列的最后reset_index