将具有不同行的多个Excel文件合并到pandas

时间:2017-01-20 09:28:21

标签: python excel pandas

我有4个Excel文件,我必须合并到一个Excel文件中。 包含ID,姓名缩写,年龄和性别的人口统计文件。 包含ID,缩写测试名称,测试日期和测试值的实验室文件。 包含ID,姓名缩写,医疗状况,开始和停止日期的病史。 给予的药物含有ID,缩写,药物名称,剂量,频率,开始和停止日期。

有50名患者。人口统计文件包含50行50名患者。其余文件有50名患者,但在100到400行之间,因为每位患者都有多个实验室测试或多种药物。

当我合并大熊猫时,我有重复或将实体分配给错误的患者。面临的挑战是这样做,以便在患者服用的药物多于实验室检测的情况下,实验室检测应该用空白替换重复药物。

这是一个缩短的表示形式:

import pandas as pd 
lab = pd.read_excel('data/data.xlsx', sheetname='lab') 
drugs = pd.read_excel('data/data.xlsx', sheetname='drugs') 
merged_data = pd.merge(drugs, lab, on='ID', how='left')
merged_data.to_excel('merged_data.xls')

你得到这个结果:Pandas merge result

我更喜欢这个结果:Prefered output

1 个答案:

答案 0 :(得分:1)

考虑在cumcount()上使用groupby(),然后使用ID加入该字段:

drugs['GrpCount'] = (drugs.groupby(['ID'])).cumcount()

lab['GrpCount'] = (lab.groupby(['ID'])).cumcount()

merged_data = pd.merge(drugs, lab, on=['ID', 'GrpCount'], how='left').drop(['GrpCount'], axis=1)

#     ID Initials_x                      Drug Name          Frequency          Route   Start Date     End Date Initials_y                    Name Result Date    Result
# 0    1         AB                       AMPICLOX                NaN           Oral  21-Jun-2016  21-Jun-2016         AB  Rapid Diagnostic Test    30-May-16  Abnormal
# 1    1         AB                  CIPROFLOXACIN              Daily           Oral  30-May-2016  03-Jun-2016         AB              Microscopy   30-May-16    Normal
# 2    1         AB        Ibuprofen Tablet 400 mg    Two Times a Day           Oral  06-Oct-2016  10-Oct-2016        NaN                     NaN         NaN       NaN
# 3    1         AB                        COARTEM                NaN           Oral  17-Jun-2016  17-Jun-2016        NaN                     NaN         NaN       NaN
# 4    1         AB          INJECTABLE ARTESUNATE          12 Hourly    Intravenous  01-Jun-2016  02-Jun-2016        NaN                     NaN         NaN       NaN
# 5    1         AB                  COTRIMOXAZOLE              Daily           Oral  30-May-2016  12-Jun-2016        NaN                     NaN         NaN       NaN
# 6    1         AB                  METRONIDAZOLE    Two Times a Day           Oral  30-May-2016  03-Jun-2016        NaN                     NaN         NaN       NaN
# 7    2         SS                     GENTAMICIN              Daily    Intravenous  04-Jun-2016  04-Jun-2016         SS              Microscopy    6-Jun-16  Abnormal
# 8    2         SS                  METRONIDAZOLE           8 Hourly    Intravenous  04-Jun-2016  06-Jun-2016         SS    Complete Blood Count    6-Oct-16  Recorded
# 9    2         SS  Oral Rehydration Salts Powder                PRN           Oral  06-Jun-2016  06-Jun-2016        NaN                     NaN         NaN       NaN
# 10   2         SS                           ZINC           8 Hourly           Oral  06-Jun-2016  06-Jun-2016        NaN                     NaN         NaN       NaN