在熊猫数据框中找到连续的开始和结束索引

时间:2019-10-21 10:07:43

标签: python pandas

我有以下数据框:

     A    B    C
0    1    1    1
1    0    1    0
2    1    1    1
3    1    0    1
4    1    1    0
5    1    1    0 
6    0    1    1
7    0    1    0

我想知道每列3个或更多连续值的值为1时的开始和结束索引。期望的结果:

Column    From    To    
     A       2     5
     B       1     3         
     B       4     7

首先,我过滤掉三个或三个以上不连续的值

filtered_df = df.copy().apply(filter, threshold=3)

其中

def filter(col, threshold=3):  
    mask = col.groupby((col != col.shift()).cumsum()).transform('count').lt(threshold)
    mask &= col.eq(1)
    col.update(col.loc[mask].replace(1,0))
    return col

filtered_df现在看起来是:

     A    B    C
0    0    1    0
1    0    1    0
2    1    1    0
3    1    0    0
4    1    1    0
5    1    1    0 
6    0    1    0
7    0    1    0

如果数据框只有一列,且零和一,则结果可以像How to use pandas to find consecutive same data in time series中那样实现。但是,我正在努力一次对多个列执行类似的操作。

2 个答案:

答案 0 :(得分:2)

DataFrame.pipe用于所有{ "name": "secsign/secsign", "type": "typo3-cms-extension", "description": "This extension allows users to authenticate using their smart phone running the SecSign App.", "authors": [ { "name": "SecSign Technologies Inc.", "role": "Developer" } ], "require": { "typo3/cms-core": "^9.5" }, "autoload": { "psr-4": { "Secsign\\Secsign\\": "Classes", "TYPO3\\CMS\\Secsign\\": "public/typo3conf/ext/secsign/Classes/" } }, "autoload-dev": { "psr-4": { "Secsign\\Secsign\\Tests\\": "Tests" } }, "replace": { "secsign/secsign": "self.version", "typo3-ter/secsign": "self.version" } } 的应用功能。

在第一个解决方案中,获取每列连续INCLUDEPATH += <path_to_boost_dir>的第一个和最后一个值,将输出添加到列表中,最后一个INCLUDEPATH += D:/MDT/boost_1_71_0/

DataFrame

或首先通过1重塑形状,然后应用解决方案:

concat
def f(df, threshold=3): 
    out = []
    for col in df.columns:
        m = df[col].eq(1)
        g = (df[col] != df[col].shift()).cumsum()[m]
        mask = g.groupby(g).transform('count').ge(threshold)
        filt = g[mask].reset_index()
        output = filt.groupby(col)['index'].agg(['first','last'])
        output.insert(0, 'col', col)
        out.append(output)

    return pd.concat(out, ignore_index=True)

答案 1 :(得分:1)

您可以使用rolling在数据框上创建一个窗口。然后,您可以应用所有条件,并将窗口shift回到其起始位置:

length = 3
window = df.rolling(length)
mask = (window.min() == 1) & (window.max() == 1)
mask = mask.shift(1 - length)
print(mask)

打印:

       A      B      C
0  False   True  False
1  False  False  False
2   True  False  False
3   True  False  False
4  False   True  False
5  False   True  False
6    NaN    NaN    NaN
7    NaN    NaN    NaN