Python:通过减去连续的行,从现有的日期列创建一个新的日期列

时间:2017-05-18 11:51:00

标签: python pandas datetime

代码:

import pandas as pd
df = pd.read_csv('xyz.csv', usecols=['transaction_date', 'amount'])
df=pd.concat(g for _, g in df.groupby("amount") if len(g) > 3)
df=df.reset_index(drop=True)
print(df)

输出:

    transaction_date    amount
0         2016-06-02      50.0
1         2016-06-02      50.0
2         2016-06-02      50.0
3         2016-06-02      50.0
4         2016-06-02      50.0
5         2016-06-02      50.0
6         2016-07-04      50.0
7         2016-07-04      50.0
8         2016-09-29     225.0
9         2016-10-29     225.0
10        2016-11-29     225.0
11        2016-12-30     225.0
12        2017-01-30     225.0
13        2016-05-16    1000.0
14        2016-05-20    1000.0

我需要在amount列旁边添加另一列,它给出了transaction_date的相应行之间的差异 e.g。

     transaction_date   amount  delta(days)
0         2016-06-02      50.0     -
1         2016-06-02      50.0     0
2         2016-06-02      50.0     0
3         2016-06-02      50.0     0
4         2016-06-02      50.0     0
5         2016-06-02      50.0     0
6         2016-07-04      50.0    32
7         2016-07-04      50.0    .
8         2016-09-29     225.0    .
9         2016-10-29     225.0    .
10        2016-11-29     225.0

3 个答案:

答案 0 :(得分:0)

可能有一些更好的方法,但您可以使用pandas.Series.shift

>>> df.transaction_date.shift(-1) - df.transaction_date
0       0 days
1       0 days
2       0 days
3       0 days
4       0 days
5      32 days
6       0 days
7      87 days
8      30 days
9      31 days
10     31 days
11     31 days
12   -259 days
13      4 days
14         NaT

答案 1 :(得分:0)

我认为您需要diff + dt.days

df['delta(days)'] = df['transaction_date'].diff().dt.days
print (df)
   transaction_date  amount  delta(days)
0        2016-06-02    50.0          NaN
1        2016-06-02    50.0          0.0
2        2016-06-02    50.0          0.0
3        2016-06-02    50.0          0.0
4        2016-06-02    50.0          0.0
5        2016-06-02    50.0          0.0
6        2016-07-04    50.0         32.0
7        2016-07-04    50.0          0.0
8        2016-09-29   225.0         87.0
9        2016-10-29   225.0         30.0
10       2016-11-29   225.0         31.0
11       2016-12-30   225.0         31.0
12       2017-01-30   225.0         31.0
13       2016-05-16  1000.0       -259.0
14       2016-05-20  1000.0          4.0

但如果需要按群组计算,请添加groupby

df['delta(days)'] = df.groupby('amount')['transaction_date'].diff().dt.days
print (df)
   transaction_date  amount  delta(days)
0        2016-06-02    50.0          NaN
1        2016-06-02    50.0          0.0
2        2016-06-02    50.0          0.0
3        2016-06-02    50.0          0.0
4        2016-06-02    50.0          0.0
5        2016-06-02    50.0          0.0
6        2016-07-04    50.0         32.0
7        2016-07-04    50.0          0.0
8        2016-09-29   225.0          NaN
9        2016-10-29   225.0         30.0
10       2016-11-29   225.0         31.0
11       2016-12-30   225.0         31.0
12       2017-01-30   225.0         31.0
13       2016-05-16  1000.0          NaN
14       2016-05-20  1000.0          4.0

答案 2 :(得分:0)

要获得您已请求的确切输出(排序可选),请使用shift解析timedelta,使用dt.days查找int

df.transaction_date = pd.to_datetime(df.transaction_date)
df.sort_values('transaction_date', inplace=True)
df['delta(days)'] = (df['transaction_date'] - df['transaction_date'].shift(1)).dt.days

输出:

   transaction_date  amount  delta(days)
13       2016-05-16  1000.0          NaN
14       2016-05-20  1000.0          4.0
0        2016-06-02    50.0         13.0
1        2016-06-02    50.0          0.0
2        2016-06-02    50.0          0.0
3        2016-06-02    50.0          0.0
4        2016-06-02    50.0          0.0
5        2016-06-02    50.0          0.0
6        2016-07-04    50.0         32.0
7        2016-07-04    50.0          0.0
8        2016-09-29   225.0         87.0
9        2016-10-29   225.0         30.0
10       2016-11-29   225.0         31.0
11       2016-12-30   225.0         31.0
12       2017-01-30   225.0         31.0