Question

我有一个包含多列的pandas数据框，其中我对具有一系列（1或0）的特定列感兴趣。我想要执行的逻辑是：

If (the current row is 1 and the next row is 0):
    count = count + 1
else :
    pass
df['NewCol'] = count

所以，这就是我的尝试：

secCnt = 0 
def sectionCount(data):
    global secCnt
    if( (data[['secFlg']] == 0) and (data[['secFlg'].shift(-1)] == 1) ):
        secCnt = secCnt + 1 
    else:
        pass
    return secCnt


if __name__ == "__main__":
    df['SectionIndex'] = df.apply(sectionCount(df), axis=1)

我收到错误：

ValueError：DataFrame的真值是不明确的。使用a.empty，a.bool（），a.item（），a.any（）或a.all（）。

p我是pandas的新手，正在从pdf文件中执行文本提取，并有兴趣找到pdf文件中的部分

Answer 1

我认为需要创建一个布尔掩码，通过0按&（按位AND）与shift ed值进行比较，并计算使用cumsum：

np.random.seed(1213)

df = pd.DataFrame({'secFlg':np.random.randint(2, size=20)})

df['SectionIndex'] = ((df['secFlg'] == 0) & (df['secFlg'].shift() == 1)).cumsum()
print (df)
    secFlg  SectionIndex
0        0             0
1        1             0
2        1             0
3        1             0
4        0             1
5        0             1
6        0             1
7        0             1
8        0             1
9        1             1
10       0             2
11       0             2
12       0             2
13       0             2
14       1             2
15       1             2
16       1             2
17       0             3
18       1             3
19       0             4

迭代pandas数据帧中的行并应用lambda函数

1 个答案: