有没有什么好的方法可以根据条件水平或广泛地附加数据?

时间:2018-01-15 12:54:24

标签: python pandas dataframe join reshape

<pre> df1
+-----+-----+-----+-----+
| id  | rc  | fq  | mt  |
+-----+-----+-----+-----+
|   1 | a   |   3 |  13 |
|   2 | b   |   2 |  31 |
|   3 | c   |   4 |  23 |
|   4 | d   |   1 |   7 |
|   5 | e   |   6 |   9 |
| ... | ... | ... | ... |
| ... | ... | ... | ... |
| ... | ... | ... | ... |
+-----+-----+-----+-----+


<pre> df2
+----+---------+----------+
| id | keyword | location |
+----+---------+----------+
| 1  | james   | (1,3)    |
| 1  | john    | (2,3)    |
| 2  | daniel  | (3,9)    |
| 3  | peter   | (5,2)    |
| 3  | hugh    | (7,1)    |
| 3  | kevin   | (2,1)    |
| 4  | jack    | (0,8)    |
| 5  | chris   | (4,2)    |
| 5  | lisa    | (9,0)    |
| …  | …       | …        |
| …  | …       | …        |
| …  | …       | …        |
+----+---------+----------+

<pre> df3
+----+----+----+----+----------+-----------+----------+-----------+----------+-----------+---+-----------+------------+
| id | rc | fq | mt | keyword1 | location1 | keyword2 | location2 | keyword3 | location3 | … | keyword_n | location_n |
+----+----+----+----+----------+-----------+----------+-----------+----------+-----------+---+-----------+------------+
|  1 | a  |  3 | 13 | james    | (1,3)     | john     | (2,3)     |          |           | … |           |            |
|  2 | b  |  2 | 31 | daniel   | (3,9)     |          |           |          |           | … |           |            |
|  3 | c  |  4 | 23 | peter    | (5,2)     | hugh     | (7,1)     | kevin    | (2,1)     | … |           |            |
|  4 | d  |  1 |  7 | jack     | (0,8)     |          |           |          |           | … |           |            |
|  5 | e  |  6 |  9 | chris    | (4,2)     | lisa     | (9,0)     |          |           | … |           |            |
|    |    |    |    |          |           |          |           |          |           | … |           |            |
|    |    |    |    |          |           |          |           |          |           | … |           |            |
|    |    |    |    |          |           |          |           |          |           | … |           |            |
+----+----+----+----+----------+-----------+----------+-----------+----------+-----------+---+-----------+------------+

[我有一个广泛的pandas数据框,带有&#39; id&#39;列表示每行的唯一值。] [df1]

[还有df2与&#39; id&#39;,&#39;关键字&#39;和&#39; location&#39;列。 df2中的ID来自df1,因此df1和df2共享“id”。值。] [DF2]

[最后,这个df3是df1和df2的输出。如果df2中的ID与df1中的值相同,则&#39;关键字中的值为&#39;和&#39; location&#39;列应广泛附加或水平创建新列。] [df3]

大家好,

我附上图片以便更好地解释。 (请检查出来!)

我尝试了loc,concat,merge和pivot_table等,但无法弄明白。 可以请任何人就这一个提出一些建议吗?

谢谢!

  • 很抱歉附加图片而不是插入ascii表。图片已删除!

1 个答案:

答案 0 :(得分:0)

使用:

s = df2.groupby('id').cumcount().add(1).astype(str)
df2 = df2.set_index(['id', s]).unstack().sort_index(axis=1, level=1)
df2.columns = df2.columns.map(''.join)

df = df1.join(df2, on='id')
print (df)
   id CC  fq  mt keyword1 location1 keyword2 location2 keyword3 location3
0   1  a   3  13    james     (1,3)     john     (2,3)     None      None
1   2  b   2  31   daniel     (3,9)     None      None     None      None
2   3  c   4  23    peter     (5,2)     hugh     (7,1)    kevin     (2,1)
3   4  d   1   7     jack     (0,8)     None      None     None      None
4   5  e   6   9    chris     (4,2)     lisa     (9,0)     None      None