我有一个数据框,每个用户都有行加入我的网站并进行购买。
+---+-----+--------------------+---------+--------+-----+
| | uid | msg | _time | gender | age |
+---+-----+--------------------+---------+--------+-----+
| 0 | 1 | confirmed_settings | 1/29/15 | M | 37 |
| 1 | 1 | sale | 4/13/15 | M | 37 |
| 2 | 3 | confirmed_settings | 4/19/15 | M | 35 |
| 3 | 4 | confirmed_settings | 2/21/15 | M | 21 |
| 4 | 5 | confirmed_settings | 3/28/15 | M | 18 |
| 5 | 4 | sale | 3/15/15 | M | 21 |
+---+-----+--------------------+---------+--------+-----+
我想更改数据框,以便每个行对于uid都是唯一的,并且有一个名为sale
和confirmed_settings
的列,其中包含操作的时间戳。请注意,并非每个用户都有sale
,但每个用户都有confirmed_settings
。如下所示:
+---+-----+--------------------+---------+---------+--------+-----+
| | uid | confirmed_settings | sale | _time | gender | age |
+---+-----+--------------------+---------+---------+--------+-----+
| 0 | 1 | 1/29/15 | 4/13/15 | 1/29/15 | M | 37 |
| 1 | 3 | 4/19/15 | null | 4/19/15 | M | 35 |
| 2 | 4 | 2/21/15 | 3/15/15 | 2/21/15 | M | 21 |
| 3 | 5 | 3/28/15 | null | 3/28/15 | M | 18 |
+---+-----+--------------------+---------+---------+--------+-----+
实现这一目标的最佳熊猫习语/功能是什么?
答案 0 :(得分:1)
不知道它是否是最佳解决方案,但应该有效:
In [1]: df
Out[1]:
uid msg _time gender age
0 1 confirmed_settings 1/29/15 M 37
1 1 sale 4/13/15 M 37
2 3 confirmed_settings 4/19/15 M 35
3 4 confirmed_settings 2/21/15 M 21
4 5 confirmed_settings 3/28/15 M 18
5 4 sale 3/15/15 M 21
In [2]: df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index()
In [3]: df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid')
In [4]: df1
Out[4]:
uid confirmed_settings sale gender age
0 1 1/29/15 4/13/15 M 37
2 3 4/19/15 NaN M 35
3 4 2/21/15 3/15/15 M 21
5 5 3/28/15 NaN M 18