操纵熊猫柱

时间:2019-06-03 17:21:24

标签: python pandas

我有一些数据(最多Event)和预期输出(KeyTime),如下所示:

+----------+------------+-------+-----+------+
| Location |    Date    | Event | Key | Time |
+----------+------------+-------+-----+------+
| i2       | 2019-03-02 |     1 | a   |      |
| i2       | 2019-03-02 |     1 | a   |      |
| i2       | 2019-03-02 |     1 | a   |      |
| i2       | 2019-03-04 |     1 | a   |    2 |
| i2       | 2019-03-15 |     2 | b   |    0 |
| i9       | 2019-02-22 |     2 | c   |    0 |
| i9       | 2019-03-10 |     3 | d   |      |
| i9       | 2019-03-10 |     3 | d   |    0 |
| s8       | 2019-04-22 |     1 | e   |      |
| s8       | 2019-04-25 |     1 | e   |      |
| s8       | 2019-04-28 |     1 | e   |    6 |
| t14      | 2019-05-13 |     3 | f   |      |
+----------+------------+-------+-----+------+

只要LocationEvent(或两者)更改,就会创建一个新的Key。我主要对Time输出感兴趣,它是每个Key的第一行和最后一行之间的天数差异。如果Key中有一行,则Time为0。我们是否仍然需要创建Key还是可以直接获得Time的差距?

2 个答案:

答案 0 :(得分:4)

我认为您不需要在此处创建Key

df['Time']=df.groupby(['Location','Event']).Date.\
                  transform(lambda x : (x.iloc[-1]-x.iloc[0]))[~df.duplicated(['Location','Event'],keep='last')]
df
Out[107]: 
   Location       Date Event Key   Time
0        i2 2019-03-02     1   a    NaT
1        i2 2019-03-02     1   a    NaT
2        i2 2019-03-02     1   a    NaT
3        i2 2019-03-04     1   a 2 days
4        i2 2019-03-15     2   b 0 days
5        i9 2019-02-22     2   c 0 days
6        i9 2019-03-10     3   d    NaT
7        i9 2019-03-10     3   d 0 days
8        s8 2019-04-22     1   e    NaT
9        s8 2019-04-25     1   e    NaT
10       s8 2019-04-28     1   e 6 days
11      t14 2019-05-13     3   f 0 days

答案 1 :(得分:0)

矢量化方法

df['Date'] = pd.to_datetime(df['Date'])
df['diff'] = df['Key'].ne(df['Key'].shift(-1).ffill()).astype(int)
x = df.groupby(['Location','Event'])['Date'].transform(np.ptp)
df.loc[df['diff'] == 1, 'date_diff'] = x
df

Location    Date    Event   Key Time    diff    date_diff
1   i2  2019-03-02  1   a       0   NaT
2   i2  2019-03-02  1   a       0   NaT
3   i2  2019-03-02  1   a       0   NaT
4   i2  2019-03-04  1   a   2   1   2 days
5   i2  2019-03-15  2   b   0   1   0 days
6   i9  2019-02-22  2   c   0   1   0 days
7   i9  2019-03-10  3   d       0   NaT
8   i9  2019-03-10  3   d   0   1   0 days
9   s8  2019-04-22  1   e       0   NaT
10  s8  2019-04-25  1   e       0   NaT
11  s8  2019-04-28  1   e   6   1   6 days
12  t14 2019-05-13  3   f       0   NaT