鉴于以下代码连接三个数据帧,我需要展平结果:
import pandas as pd
raw_data = {
'type_1': [1, 1, 0, 0, 1],
'subject_id_1': ['1', '2', '3', '4', '5'],
'first_name_1': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'subject_id_1', 'first_name_1'])
raw_datab = {
'type_2': [1, 1, 0, 0, 0],
'subject_id_2': ['4', '5', '6', '7', '8'],
'first_name_2': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'subject_id_2', 'first_name_2'])
raw_datac = {
'type_3': [1, 1],
'subject_id_3': ['4', '5'],
'first_name_3': ['Joe', 'Paul']}
df_c = pd.DataFrame(raw_datac, columns = ['type_3', 'subject_id_3', 'first_name_3'])
dfs = [df_a.set_index(['type_1','subject_id_1']),
df_b.set_index(['type_2','subject_id_2']),
df_c.set_index(['type_3','subject_id_3'])]
df = pd.concat(dfs, axis=1)
print (df)
代码打印:
first_name_1 first_name_2 first_name_3
0 3 Allen NaN NaN
4 Alice NaN NaN
6 NaN Bran NaN
7 NaN Bryce NaN
8 NaN Betty NaN
1 1 Alex NaN NaN
2 Amy NaN NaN
4 NaN Billy Joe
5 Ayoung Brian Paul
但我需要展平它,结果应该是一个包含以下内容的列表,类似于SQL SELECT结果(不能包含所有数据,但你明白了):
type_1 subject_id_1 first_name_1 type_2 subject_id_2 first_name_2 ...
0 3 Allen 0 3 NaN ...
0 4 Alice 0 4 NaN ...
0 6 NaN 0 6 Bran ...
0 7 NaN 0 7 Bryce ...
0 8 NaN 0 8 Betty ...
1 1 Alex 1 1 NaN ...
1 2 Amy 1 2 NaN ...
1 4 NaN 1 4 Billy ...
1 5 Ayoung 1 5 Brian ...
Pandas可以吗?
答案 0 :(得分:3)
在drop=False
中添加set_index
,然后使用str.contains
+ fillna
我们实现预期的输出
dfs = [df_a.set_index(['type_1','subject_id_1'],drop=False),
df_b.set_index(['type_2','subject_id_2'],drop=False),
df_c.set_index(['type_3','subject_id_3'],drop=False)]
df = pd.concat(dfs, axis=1)
df.loc[:,df.columns.str.contains('type')]=df.loc[:,df.columns.str.contains('type')].apply(lambda x : x.fillna(df.index.to_frame()[0]).astype(int))
df.loc[:,df.columns.str.contains('subject_id')]=df.loc[:,df.columns.str.contains('subject_id')].apply(lambda x :x.fillna(df.index.to_frame()[1]).astype(int))