我有两个数据帧:
df1 = pd.DataFrame({'ID':pd.Series(['id1','id2','id3']),
'DoB':pd.Series(['15/2','12/3','25/12']),
'Surgery Date':pd.Series(['1/1','2/1','3/1'])})
df2 = {'1':pd.Series(['id1', 'test1', '45']),
'2':pd.Series(['id1', 'test2', '423']),
'3':pd.Series(['id1', 'test3', '5']),
'4':pd.Series(['id2', 'test2', '421'])}
df2 = pd.DataFrame(df2)
df2 = df2.rename(index={0,:'id'})
df2 = df2.rename(index={1,:'test name'})
df2 = df2.rename(index={2,:'test value'})
我想合并这两个,以便df1为每个测试名称包含一个新列,每个id包含下面的测试值。即:
df1 = pd.DataFrame({'ID':pd.Series(['id1','id2','id3']),
'DoB':pd.Series(['15/2',12/3','25/12']),
'Surgery Date':pd.Series(['1/1','2/1','3/1']),
'Test1':pd.Series([45]),
'Test2':pd.Series([426,421])})
每个ID都有不同数量的测试,我希望代码可以在更大的数据库上运行。
干杯!
答案 0 :(得分:0)
我打破了这些步骤..
df2=df2.T.set_index(['id','test name']).unstack()
df2.columns=df2.columns.droplevel()
df1.merge(df2,left_on='ID',right_index=True,how='left')
Out[457]:
DoB ID Surgery Date test1 test2 test3
0 15/2 id1 1/1 45 423 5
1 12/3 id2 2/1 None 421 None
2 25/12 id3 3/1 NaN NaN NaN