我想用group by
和rank
进行编码,但条件是如果容器的总和超过2000,则应将其放入下一组。熊猫能做到吗?
我有以下数据:
+---+----------+--+------------------+
| 1 | Load No. | | Code Weight |
| 2 | 1 | | 4000 200 |
| 3 | 2 | | 4000 1800 |
| 4 | 3 | | 4000 400 |
| 5 | 4 | | 4000 1000 |
| 6 | 5 | | 5000 1000 |
| 7 | 6 | | 5000 800 |
| 8 | 7 | | 5000 1200 |
+---+----------+--+------------------+
输出:
| 1 | Load No. | Code Weight Container Total Sum
| 2 | 1 | 4000 200 1 2000
| 3 | 2 | 4000 1800 1 2000
| 4 | 3 | 4000 400 2 1400
| 5 | 4 | 4000 1000 2 1400
| 6 | 5 | 5000 1000 3 1800
| 7 | 6 | 5000 800 3 1800
| 8 | 7 | 5000 1200 4 1200
答案 0 :(得分:0)
一种获取Container
的方法
s=df.Weight.cumsum()/2000
pd.cut(s,np.arange(0,max(s)+1,1)).cat.codes+1
0 1
1 1
2 2
3 2
4 3
5 3
6 4
dtype: int8
df['container']=pd.cut(s,np.arange(0,max(s)+1,1)).cat.codes+1
然后我们使用transform
df['total sum']=df.groupby('container').Weight.transform('sum')
df
LoadNo. Code Weight container total sum
0 1 4000 200 1 2000
1 2 4000 1800 1 2000
2 3 4000 400 2 1400
3 4 4000 1000 2 1400
4 5 5000 1000 3 1800
5 6 5000 800 3 1800
6 7 5000 1200 4 1200