给定另一列的值时,如何增加一列的计数?

时间:2019-04-16 04:03:49

标签: python pandas

我正在尝试找到一种在非常大的数据集中生成值caseid的方法。我希望caseid变量做两件事:(1)当1时增加y = 1。重要的是,在观察到caseid之后的行中,y = 1的值应增加,并且当1的变化为(2)时,{2}的值应增加case。值,即从AB

示例数据如下:

case = pd.Series(['A', 'A', 'A', 'A', 
                  'B', 'B', 'B', 'B', 
                  'C', 'C', 'C', 'C'])
y = pd.Series([0, 1, 0, 0, 
               0, 1, 0, 0, 
               0, 0, 1, 0])
year = [2016, 2017, 2018, 2019, 
        2016, 2017, 2018, 2019,
        2016, 2017, 2018, 2019]
caseid = pd.Series([1, 1, 2, 2,
                    3, 3, 4, 4,
                    5, 5, 5, 6])
dict = {'case': case, 'y': y, 'year': year, 'caseid' : caseid}  
df = pd.DataFrame(dict) 

   case  y  year  caseid
0     A  0  2016       1
1     A  1  2017       1
2     A  0  2018       2
3     A  0  2019       2
4     B  0  2016       3
5     B  1  2017       3
6     B  0  2018       4
7     B  0  2019       4
8     C  0  2016       5
9     C  0  2017       5
10    C  1  2018       5
11    C  0  2019       6

非常感谢您的慷慨帮助!

2 个答案:

答案 0 :(得分:1)

=(COUNTIFS($C$3:$C$14,C3,$D$3:$D$14,"<"&D3,$E$3:$E$14,"Wash")+1)+IF(LEN(C3)>6,COUNTIFS($C$3:$C$14,$C$3,$E$3:$E$14,"Wash"),0) DataFrame.cumsum一起使用:

boolean mask

答案 1 :(得分:1)

这有效:

select currentStockDate as startDate, LEAD(currentStockDate,1) as EndDate, currentStock from (select * from (select LAG(transaction_date,1) over(order by transaction_date) as prevStockDate, transaction_date as CurrentstockDate, LAG(stock,1) over(order by transaction_date) as prevStock, stock as currentStock from sample_table) as t where (prevStock <> currentStock) or (prevStock is null) ) as t2

积分:@Quang Hoang(仅缺少括号)