<pre> df1
+-----+-----+-----+-----+
| id | rc | fq | mt |
+-----+-----+-----+-----+
| 1 | a | 3 | 13 |
| 2 | b | 2 | 31 |
| 3 | c | 4 | 23 |
| 4 | d | 1 | 7 |
| 5 | e | 6 | 9 |
| ... | ... | ... | ... |
| ... | ... | ... | ... |
| ... | ... | ... | ... |
+-----+-----+-----+-----+
<pre> df2
+----+---------+----------+
| id | keyword | location |
+----+---------+----------+
| 1 | james | (1,3) |
| 1 | john | (2,3) |
| 2 | daniel | (3,9) |
| 3 | peter | (5,2) |
| 3 | hugh | (7,1) |
| 3 | kevin | (2,1) |
| 4 | jack | (0,8) |
| 5 | chris | (4,2) |
| 5 | lisa | (9,0) |
| … | … | … |
| … | … | … |
| … | … | … |
+----+---------+----------+
<pre> df3
+----+----+----+----+----------+-----------+----------+-----------+----------+-----------+---+-----------+------------+
| id | rc | fq | mt | keyword1 | location1 | keyword2 | location2 | keyword3 | location3 | … | keyword_n | location_n |
+----+----+----+----+----------+-----------+----------+-----------+----------+-----------+---+-----------+------------+
| 1 | a | 3 | 13 | james | (1,3) | john | (2,3) | | | … | | |
| 2 | b | 2 | 31 | daniel | (3,9) | | | | | … | | |
| 3 | c | 4 | 23 | peter | (5,2) | hugh | (7,1) | kevin | (2,1) | … | | |
| 4 | d | 1 | 7 | jack | (0,8) | | | | | … | | |
| 5 | e | 6 | 9 | chris | (4,2) | lisa | (9,0) | | | … | | |
| | | | | | | | | | | … | | |
| | | | | | | | | | | … | | |
| | | | | | | | | | | … | | |
+----+----+----+----+----------+-----------+----------+-----------+----------+-----------+---+-----------+------------+
[我有一个广泛的pandas数据框,带有&#39; id&#39;列表示每行的唯一值。] [df1]
[还有df2与&#39; id&#39;,&#39;关键字&#39;和&#39; location&#39;列。 df2中的ID来自df1,因此df1和df2共享“id”。值。] [DF2]
[最后,这个df3是df1和df2的输出。如果df2中的ID与df1中的值相同,则&#39;关键字中的值为&#39;和&#39; location&#39;列应广泛附加或水平创建新列。] [df3]
大家好,
我附上图片以便更好地解释。 (请检查出来!)
我尝试了loc,concat,merge和pivot_table等,但无法弄明白。 可以请任何人就这一个提出一些建议吗?
谢谢!
答案 0 :(得分:0)
使用:
id
的第set_index
条和cumcount
unstack
sort_index
map
join
展平列
join
到第一个DataFrame
s = df2.groupby('id').cumcount().add(1).astype(str)
df2 = df2.set_index(['id', s]).unstack().sort_index(axis=1, level=1)
df2.columns = df2.columns.map(''.join)
df = df1.join(df2, on='id')
print (df)
id CC fq mt keyword1 location1 keyword2 location2 keyword3 location3
0 1 a 3 13 james (1,3) john (2,3) None None
1 2 b 2 31 daniel (3,9) None None None None
2 3 c 4 23 peter (5,2) hugh (7,1) kevin (2,1)
3 4 d 1 7 jack (0,8) None None None None
4 5 e 6 9 chris (4,2) lisa (9,0) None None