我有一个类似于以下内容的数据框
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6]})
df['value_at_1'] = np.where(df['value'] == 1,1,0)
df
>>>
id value value_at_1
1 1 1
2 2.5 0
3 1.1 0
4 1.4 0
5 1.5 0
6 1 1
7 1.5 0
8 3 0
9 1 1
10 1.6 0
我想创建一个枚举“value_at_1”变量的变量,每次递增直到它达到1,然后重新启动。结果如下:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],
'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6],
'value_at_1':[1,0,0,0,0,1,0,0,1,0],
'count_since_1':[0,1,2,3,4,0,1,2,0,1]}).set_index(['value_at_1'])
>>>
id value value_at_1 count_since_1
1 1 1 0
2 2.5 0 1
3 1.1 0 2
4 1.4 0 3
5 1.5 0 4
6 1 1 0
7 1.5 0 1
8 3 0 2
9 1 1 0
10 1.6 0 1
有人能帮助我以这种方式操纵数据吗?谢谢!
答案 0 :(得分:4)
在groupby
中使用cumcount
获取df.assign(
count_since_1=df.value_at_1.groupby(df.value_at_1.cumsum()).cumcount())
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1
ReactDOM.render
答案 1 :(得分:2)
只想提供一种新方式
import pandas as pd
import numpy as np
import functools
idx=df.index[df['value_at_1'].eq(1)].values.tolist()+[len(df)]
idx=list(np.diff(idx))
df['count_since_1']=functools.reduce(lambda x,y: x+y,[list(range(y)) for y in idx])
df
Out[945]:
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1