代码:
import pandas as pd
df = pd.read_csv('xyz.csv', usecols=['transaction_date', 'amount'])
df=pd.concat(g for _, g in df.groupby("amount") if len(g) > 3)
df=df.reset_index(drop=True)
print(df)
输出:
transaction_date amount
0 2016-06-02 50.0
1 2016-06-02 50.0
2 2016-06-02 50.0
3 2016-06-02 50.0
4 2016-06-02 50.0
5 2016-06-02 50.0
6 2016-07-04 50.0
7 2016-07-04 50.0
8 2016-09-29 225.0
9 2016-10-29 225.0
10 2016-11-29 225.0
11 2016-12-30 225.0
12 2017-01-30 225.0
13 2016-05-16 1000.0
14 2016-05-20 1000.0
我需要在amount列旁边添加另一列,它给出了transaction_date的相应行之间的差异 e.g。
transaction_date amount delta(days)
0 2016-06-02 50.0 -
1 2016-06-02 50.0 0
2 2016-06-02 50.0 0
3 2016-06-02 50.0 0
4 2016-06-02 50.0 0
5 2016-06-02 50.0 0
6 2016-07-04 50.0 32
7 2016-07-04 50.0 .
8 2016-09-29 225.0 .
9 2016-10-29 225.0 .
10 2016-11-29 225.0
答案 0 :(得分:0)
可能有一些更好的方法,但您可以使用pandas.Series.shift
:
>>> df.transaction_date.shift(-1) - df.transaction_date
0 0 days
1 0 days
2 0 days
3 0 days
4 0 days
5 32 days
6 0 days
7 87 days
8 30 days
9 31 days
10 31 days
11 31 days
12 -259 days
13 4 days
14 NaT
答案 1 :(得分:0)
df['delta(days)'] = df['transaction_date'].diff().dt.days
print (df)
transaction_date amount delta(days)
0 2016-06-02 50.0 NaN
1 2016-06-02 50.0 0.0
2 2016-06-02 50.0 0.0
3 2016-06-02 50.0 0.0
4 2016-06-02 50.0 0.0
5 2016-06-02 50.0 0.0
6 2016-07-04 50.0 32.0
7 2016-07-04 50.0 0.0
8 2016-09-29 225.0 87.0
9 2016-10-29 225.0 30.0
10 2016-11-29 225.0 31.0
11 2016-12-30 225.0 31.0
12 2017-01-30 225.0 31.0
13 2016-05-16 1000.0 -259.0
14 2016-05-20 1000.0 4.0
但如果需要按群组计算,请添加groupby
:
df['delta(days)'] = df.groupby('amount')['transaction_date'].diff().dt.days
print (df)
transaction_date amount delta(days)
0 2016-06-02 50.0 NaN
1 2016-06-02 50.0 0.0
2 2016-06-02 50.0 0.0
3 2016-06-02 50.0 0.0
4 2016-06-02 50.0 0.0
5 2016-06-02 50.0 0.0
6 2016-07-04 50.0 32.0
7 2016-07-04 50.0 0.0
8 2016-09-29 225.0 NaN
9 2016-10-29 225.0 30.0
10 2016-11-29 225.0 31.0
11 2016-12-30 225.0 31.0
12 2017-01-30 225.0 31.0
13 2016-05-16 1000.0 NaN
14 2016-05-20 1000.0 4.0
答案 2 :(得分:0)
要获得您已请求的确切输出(排序可选),请使用shift
解析timedelta
,使用dt.days
查找int
:
df.transaction_date = pd.to_datetime(df.transaction_date)
df.sort_values('transaction_date', inplace=True)
df['delta(days)'] = (df['transaction_date'] - df['transaction_date'].shift(1)).dt.days
输出:
transaction_date amount delta(days)
13 2016-05-16 1000.0 NaN
14 2016-05-20 1000.0 4.0
0 2016-06-02 50.0 13.0
1 2016-06-02 50.0 0.0
2 2016-06-02 50.0 0.0
3 2016-06-02 50.0 0.0
4 2016-06-02 50.0 0.0
5 2016-06-02 50.0 0.0
6 2016-07-04 50.0 32.0
7 2016-07-04 50.0 0.0
8 2016-09-29 225.0 87.0
9 2016-10-29 225.0 30.0
10 2016-11-29 225.0 31.0
11 2016-12-30 225.0 31.0
12 2017-01-30 225.0 31.0