我有以下数据框:
A B C
0 1 1 1
1 0 1 0
2 1 1 1
3 1 0 1
4 1 1 0
5 1 1 0
6 0 1 1
7 0 1 0
我想知道每列3个或更多连续值的值为1时的开始和结束索引。期望的结果:
Column From To
A 2 5
B 1 3
B 4 7
首先,我过滤掉三个或三个以上不连续的值
filtered_df = df.copy().apply(filter, threshold=3)
其中
def filter(col, threshold=3):
mask = col.groupby((col != col.shift()).cumsum()).transform('count').lt(threshold)
mask &= col.eq(1)
col.update(col.loc[mask].replace(1,0))
return col
filtered_df
现在看起来是:
A B C
0 0 1 0
1 0 1 0
2 1 1 0
3 1 0 0
4 1 1 0
5 1 1 0
6 0 1 0
7 0 1 0
如果数据框只有一列,且零和一,则结果可以像How to use pandas to find consecutive same data in time series中那样实现。但是,我正在努力一次对多个列执行类似的操作。
答案 0 :(得分:2)
将DataFrame.pipe
用于所有{
"name": "secsign/secsign",
"type": "typo3-cms-extension",
"description": "This extension allows users to authenticate using their smart phone running the SecSign App.",
"authors": [
{
"name": "SecSign Technologies Inc.",
"role": "Developer"
}
],
"require": {
"typo3/cms-core": "^9.5"
},
"autoload": {
"psr-4": {
"Secsign\\Secsign\\": "Classes",
"TYPO3\\CMS\\Secsign\\": "public/typo3conf/ext/secsign/Classes/"
}
},
"autoload-dev": {
"psr-4": {
"Secsign\\Secsign\\Tests\\": "Tests"
}
},
"replace": {
"secsign/secsign": "self.version",
"typo3-ter/secsign": "self.version"
}
}
的应用功能。
在第一个解决方案中,获取每列连续INCLUDEPATH += <path_to_boost_dir>
的第一个和最后一个值,将输出添加到列表中,最后一个INCLUDEPATH += D:/MDT/boost_1_71_0/
:
DataFrame
或首先通过1
重塑形状,然后应用解决方案:
concat
def f(df, threshold=3):
out = []
for col in df.columns:
m = df[col].eq(1)
g = (df[col] != df[col].shift()).cumsum()[m]
mask = g.groupby(g).transform('count').ge(threshold)
filt = g[mask].reset_index()
output = filt.groupby(col)['index'].agg(['first','last'])
output.insert(0, 'col', col)
out.append(output)
return pd.concat(out, ignore_index=True)
答案 1 :(得分:1)
您可以使用rolling
在数据框上创建一个窗口。然后,您可以应用所有条件,并将窗口shift
回到其起始位置:
length = 3
window = df.rolling(length)
mask = (window.min() == 1) & (window.max() == 1)
mask = mask.shift(1 - length)
print(mask)
打印:
A B C
0 False True False
1 False False False
2 True False False
3 True False False
4 False True False
5 False True False
6 NaN NaN NaN
7 NaN NaN NaN