基于pandas中的行数据创建新行和新列

时间:2015-04-28 16:23:59

标签: python pandas

我有一个数据框,每个用户都有行加入我的网站并进行购买。

+---+-----+--------------------+---------+--------+-----+
|   | uid |        msg         |  _time  | gender | age |
+---+-----+--------------------+---------+--------+-----+
| 0 |   1 | confirmed_settings | 1/29/15 | M      |  37 |
| 1 |   1 | sale               | 4/13/15 | M      |  37 |
| 2 |   3 | confirmed_settings | 4/19/15 | M      |  35 |
| 3 |   4 | confirmed_settings | 2/21/15 | M      |  21 |
| 4 |   5 | confirmed_settings | 3/28/15 | M      |  18 |
| 5 |   4 | sale               | 3/15/15 | M      |  21 |
+---+-----+--------------------+---------+--------+-----+

我想更改数据框,以便每个行对于uid都是唯一的,并且有一个名为saleconfirmed_settings的列,其中包含操作的时间戳。请注意,并非每个用户都有sale,但每个用户都有confirmed_settings。如下所示:

+---+-----+--------------------+---------+---------+--------+-----+
|   | uid | confirmed_settings |  sale   |  _time  | gender | age |
+---+-----+--------------------+---------+---------+--------+-----+
| 0 |   1 | 1/29/15            | 4/13/15 | 1/29/15 | M      |  37 |
| 1 |   3 | 4/19/15            | null    | 4/19/15 | M      |  35 |
| 2 |   4 | 2/21/15            | 3/15/15 | 2/21/15 | M      |  21 |
| 3 |   5 | 3/28/15            | null    | 3/28/15 | M      |  18 |
+---+-----+--------------------+---------+---------+--------+-----+

实现这一目标的最佳熊猫习语/功能是什么?

1 个答案:

答案 0 :(得分:1)

不知道它是否是最佳解决方案,但应该有效:

In [1]: df
Out[1]:
   uid                 msg    _time gender  age
0    1  confirmed_settings  1/29/15      M   37
1    1                sale  4/13/15      M   37
2    3  confirmed_settings  4/19/15      M   35
3    4  confirmed_settings  2/21/15      M   21
4    5  confirmed_settings  3/28/15      M   18
5    4                sale  3/15/15      M   21

In [2]: df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index()
In [3]: df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid')

In [4]: df1
Out[4]: 
   uid confirmed_settings     sale gender  age
0    1            1/29/15  4/13/15      M   37
2    3            4/19/15      NaN      M   35
3    4            2/21/15  3/15/15      M   21
5    5            3/28/15      NaN      M   18