我正在尝试找到一种在非常大的数据集中生成值caseid
的方法。我希望caseid
变量做两件事:(1)当1
时增加y = 1
。重要的是,在观察到caseid
之后的行中,y = 1
的值应增加,并且当1
的变化为(2)时,{2}的值应增加case
。值,即从A
到B
。
示例数据如下:
case = pd.Series(['A', 'A', 'A', 'A',
'B', 'B', 'B', 'B',
'C', 'C', 'C', 'C'])
y = pd.Series([0, 1, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0])
year = [2016, 2017, 2018, 2019,
2016, 2017, 2018, 2019,
2016, 2017, 2018, 2019]
caseid = pd.Series([1, 1, 2, 2,
3, 3, 4, 4,
5, 5, 5, 6])
dict = {'case': case, 'y': y, 'year': year, 'caseid' : caseid}
df = pd.DataFrame(dict)
case y year caseid
0 A 0 2016 1
1 A 1 2017 1
2 A 0 2018 2
3 A 0 2019 2
4 B 0 2016 3
5 B 1 2017 3
6 B 0 2018 4
7 B 0 2019 4
8 C 0 2016 5
9 C 0 2017 5
10 C 1 2018 5
11 C 0 2019 6
非常感谢您的慷慨帮助!
答案 0 :(得分:1)
将=(COUNTIFS($C$3:$C$14,C3,$D$3:$D$14,"<"&D3,$E$3:$E$14,"Wash")+1)+IF(LEN(C3)>6,COUNTIFS($C$3:$C$14,$C$3,$E$3:$E$14,"Wash"),0)
与DataFrame.cumsum
一起使用:
boolean mask
答案 1 :(得分:1)
这有效:
select currentStockDate as startDate,
LEAD(currentStockDate,1) as EndDate,
currentStock
from
(select *
from
(select
LAG(transaction_date,1) over(order by transaction_date) as prevStockDate,
transaction_date as CurrentstockDate,
LAG(stock,1) over(order by transaction_date) as prevStock,
stock as currentStock
from sample_table) as t
where (prevStock <> currentStock) or (prevStock is null)
) as t2
积分:@Quang Hoang(仅缺少括号)