Question

我有一个X列的数据框df。我想用X

创建一个新列Y 如果同一行中的X为1，则

Y应该为1，并且上面的0´（也在X中）应最少为n（可变）个计数。如果零大于n，则结果应为“”Y。我在np.where尝试了几个小时，但没有成功。我认为我需要lambda函数，但不知道如何开始或研究。

示例n = 4：

自2018年1月25日起，结果为1，因为X为1，且上面的0大于4。

在日期2018年1月25日，结果为“”，因为0高于3（不是4）

 Dates        X    Y (like it should be...)
2018-01-02    0
2018-01-03    0
2018-01-04    0
2018-01-05    0
2018-01-08    0
2018-01-09    0
2018-01-10    0
2018-01-11    0
2018-01-12    0
2018-01-15    0
2018-01-16    0
2018-01-17    0
2018-01-18    0
2018-01-19    0
2018-01-22    0
2018-01-23    0
2018-01-24    0
2018-01-25    1  1
2018-01-29    0  
2018-01-30    0  
2018-01-31    0  
2018-02-02    1  
2018-02-05    0  
2018-02-06    0
2018-02-07    0
2018-02-08    0
2018-02-09    1  1
2018-02-12    1
2018-02-13    0

Answer 1

IIUC，

我们可以对临时列进行分组，然后为某些条件匹配应用条件cumsum + cumcount。

s = (df.assign(var1='x').groupby('var1')['X']
            .apply(lambda x : x.ne(x.shift()).ne(0).cumsum()))
# create a temp variable.

df['Count']=df.groupby([df.X,s]).cumcount()+1 # add a Count column.

matches = df.iloc[df.loc[(df['X'] == 1)].index - 1].loc[df['Count'] >= 4].index 
# find the index matches and check if the previous row has +4 or more matches

df.loc[matches + 1,'Y'] = 1 # Create your Y column.

df.drop('Count',axis=1,inplace=True) # Drop the Count Column. 

print(df[df['Y'] == 1]) # print df
    Dates  X    Y
17  2018-01-25  1  1.0
26  2018-02-09  1  1.0

python-计算lambda函数中的出现次数

1 个答案: