我有2个熊猫数据帧(df1,df2),我试图从中提取数据并创建第3个数据帧(df3)
df1有2列(一个id列,另一个包含第二个数据帧(df2)中列名的列
df1 looks like:
===============
id1 name
--- ----
1 df2_column1_name
5 df2_column1_name
33 df2_column3_name
...
... and so on
df2 looks like:
===============
id2 df2_column1_name df2_column2_name df2_column2_name .... and so on
--- ---------------- ---------------- ----------------
12 Jimmy male 25 ....
16 Becky female 30 ....
75 Mike male 80 ....
....
.... and so on
I am trying to create df3 to look like:
=======================================
column1 Column2 Column3
------- ------- -------
1 12 Jimmy
5 12 male
33 12 25
.
.
1 16 Becky
5 16 female
33 16 30
.
.
1 75 Mike
5 75 male
33 75 80
.
.
.
数据帧可能会很大。我正在尝试找出最有效的方法,如果可能的话,无需双循环。请告知最佳方法。谢谢
答案 0 :(得分:1)
堆栈和合并将带您到那里:
In [11]: df2.set_index("id2").stack().reset_index(name='value')
Out[11]:
id2 level_1 value
0 12 df2_column1_name Jimmy
1 12 df2_column2_name male
2 12 df2_column3_name 25
3 16 df2_column1_name Becky
4 16 df2_column2_name female
5 16 df2_column3_name 30
6 75 df2_column1_name Mike
7 75 df2_column2_name male
8 75 df2_column3_name 80
In [12]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")
Out[12]:
id2 level_1 value id1 name
0 12 df2_column1_name Jimmy 1 df2_column1_name
1 16 df2_column1_name Becky 1 df2_column1_name
2 75 df2_column1_name Mike 1 df2_column1_name
3 12 df2_column2_name male 5 df2_column2_name
4 16 df2_column2_name female 5 df2_column2_name
5 75 df2_column2_name male 5 df2_column2_name
6 12 df2_column3_name 25 33 df2_column3_name
7 16 df2_column3_name 30 33 df2_column3_name
8 75 df2_column3_name 80 33 df2_column3_name
最后,您只需要选择所需的列并进行排序:
In [13]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")[["id1", "id2", "value"]].sort_v
...: alues("id2")
Out[13]:
id1 id2 value
0 1 12 Jimmy
3 5 12 male
6 33 12 25
1 1 16 Becky
4 5 16 female
7 33 16 30
2 1 75 Mike
5 5 75 male
8 33 75 80