我在熊猫中有以下数据框
ID date no start end
1 01-01-2019 10 101.23 112.23
2 02-01-2019 10 112.23 120.43
3 03-01-2019 10 121.23 130.23
4 04-01-2019 10 130.23 140.43
5 01-01-2019 11 101 112
6 02-01-2019 11 112 120
7 03-01-2019 11 130 140
8 04-01-2019 11 140 150.43
我要检查当前行end
和下一行start
的值按no
进行分组,如果存在差异,则要设置一个标志并计算差异。
以下是我想要的数据框
ID date no start end flag diff
1 01-01-2019 10 101.23 112.23 0 0
2 02-01-2019 10 112.23 120.43 0 0
3 03-01-2019 10 121.23 130.23 1 1
4 04-01-2019 10 130.23 140.43 0 0
5 01-01-2019 11 101 112 0 0
6 02-01-2019 11 112 120 0 0
7 03-01-2019 11 130 140 1 10
8 04-01-2019 11 140 150.43 0 0
如何在熊猫中做到这一点?
答案 0 :(得分:3)
您可以用DataFrameGroupBy.shift
创建系列,用Series.fillna
替换前NaN
个,用Series.ne
比较,然后将掩码转换为整数,以使另一列有所不同:>
s = df.groupby('no')['end'].shift().fillna(df['start'])
df['flag'] = df['start'].ne(s).astype(int)
df['diff'] = df['start'] - s
print (df)
ID date no start end flag diff
0 1 01-01-2019 10 101.23 112.23 0 0.0
1 2 02-01-2019 10 112.23 120.43 0 0.0
2 3 03-01-2019 10 121.23 130.23 1 0.8
3 4 04-01-2019 10 130.23 140.43 0 0.0
4 5 01-01-2019 11 101.00 112.00 0 0.0
5 6 02-01-2019 11 112.00 120.00 0 0.0
6 7 03-01-2019 11 130.00 140.00 1 10.0
7 8 04-01-2019 11 140.00 150.43 0 0.0
详细信息:
print (s)
0 101.23
1 112.23
2 120.43
3 130.23
4 101.00
5 112.00
6 120.00
7 140.00
Name: end, dtype: float64