熊猫:从数据框中返回多个列子集不为零的行

时间:2018-08-08 20:54:33

标签: python pandas dataframe

我有一个名为 df

的数据框

数据框中的列可以进行逻辑分组。因此,我将列名称分组在列表A,B,C中,其中:

A = [column_1, column_2, column_3]
B = [column_4, column_5, column_6]
C = [column_7, column_8, column_9]

除了 column_1 列至 column_9 列之外, df 还有一个名为“ filename_ID” 的列,用作索引,因此不进行分组。 column_1 column_9 的列仅包含0和1值。

现在,我想过滤数据框,使其仅包含每个组(A,B,C)至少具有 个非零值的行。因此,我只想保留具有满足此条件的相应filename_ID的行。

我设法为每个组创建一个单独的数据框:

df_A = df.loc[(df[A]!=0).any(axis=1)]
df_B = df.loc[(df[B]!=0).any(axis=1)]
df_C = df.loc[(df[C]!=0).any(axis=1)]

但是,我不知道如何同时应用所有条件-即如何创建一个新的数据框,其中所有行都满足每个逻辑列组中至少有一个非零值的条件。

3 个答案:

答案 0 :(得分:3)

设置

np.random.seed([3, 1415])

df = pd.DataFrame(
    np.random.randint(2, size=(10, 9)),
    columns=[f"col{i + 1}" for i in range(9)]
)

df

   col1  col2  col3  col4  col5  col6  col7  col8  col9
0     0     1     0     1     0     0     1     0     1
1     1     1     1     0     1     1     0     1     0
2     0     0     0     0     0     0     0     0     0
3     1     0     1     1     1     1     0     0     0
4     0     0     1     1     1     1     1     0     1
5     1     1     0     1     1     1     1     1     1
6     1     0     1     0     0     0     1     1     0
7     0     0     0     0     0     1     0     1     0
8     1     0     1     0     1     0     0     1     1
9     1     0     1     0     0     1     0     1     0

解决方案

创建字典

m = {
    **dict.fromkeys(['col1', 'col2', 'col3'], 'A'),
    **dict.fromkeys(['col4', 'col5', 'col6'], 'B'),
    **dict.fromkeys(['col7', 'col8', 'col9'], 'C'),
}

然后基于groupby的{​​{1}}

axis=1

注意那些没有做到的

df[df.groupby(m, axis=1).any().all(1)]

   col1  col2  col3  col4  col5  col6  col7  col8  col9
0     0     1     0     1     0     0     1     0     1
1     1     1     1     0     1     1     0     1     0
4     0     0     1     1     1     1     1     0     1
5     1     1     0     1     1     1     1     1     1
8     1     0     1     0     1     0     0     1     1
9     1     0     1     0     0     1     0     1     0

您也可能有这样的列:

   col1  col2  col3  col4  col5  col6  col7  col8  col9
2     0     0     0     0     0     0     0     0     0
3     1     0     1     1     1     1     0     0     0
6     1     0     1     0     0     0     1     1     0
7     0     0     0     0     0     1     0     1     0

并执行相同的cols = [['col1', 'col2', 'col3'], ['col4', 'col5', 'col6'], ['col7', 'col8', 'col9']] m = {k: v for v, c in enumerate(cols) for k in c}

答案 1 :(得分:1)

尝试以下操作:

column_groups = [A, B, C]
masks = [(df[cols] != 0).any(axis=1) for cols in column_groups]
full_mask = np.logical_and.reduce(masks)
full_df = df[full_mask]

答案 2 :(得分:1)

使用示例数据创建了一个csv文件

样本输入:

(function($){
    $.fn.serializeObject = function(){

        var self = this,
            json = {},
            push_counters = {},
            patterns = {
                "validate": /^[a-zA-Z][a-zA-Z0-9_]*(?:\[(?:\d*|[a-zA-Z0-9_]+)\])*$/,
                "key":      /[a-zA-Z0-9_]+|(?=\[\])/g,
                "push":     /^$/,
                "fixed":    /^\d+$/,
                "named":    /^[a-zA-Z0-9_]+$/
            };


        this.build = function(base, key, value){  
            //base[key] = isNaN(value) && Number(value) ? Number(value) : value
            base[key] = value;
            return base;
        };

        this.push_counter = function(key){
            if(push_counters[key] === undefined){
                push_counters[key] = 0;
            }
            return push_counters[key]++;
        };

        $.each($(this).serializeArray(), function(){

            // skip invalid keys
            if(!patterns.validate.test(this.name)){
                return;
            }

            var k,
                keys = this.name.match(patterns.key),
                merge = this.value,
                reverse_key = this.name;

            while((k = keys.pop()) !== undefined){

                // adjust reverse_key
                reverse_key = reverse_key.replace(new RegExp("\\[" + k + "\\]$"), '');

                // push
                if(k.match(patterns.push)){
                    merge = self.build([], self.push_counter(reverse_key), merge);
                }

                // fixed
                else if(k.match(patterns.fixed)){
                    merge = self.build([], k, merge);
                }

                // named
                else if(k.match(patterns.named)){
                    merge = self.build({}, k, merge);
                }
            }

            json = $.extend(true, json, merge);
        });

        return json;
    };
})(jQuery);

输出:

ID  a1  a2  a3  a4  a5  a6  a7  a8  a9
1   1   1   1   1   1   1   1   1   1
2   0   0   0   1   0   0   0   1   0
3   0   1   0   0   0   0   1   0   0
4   0   0   0   0   1   0   1   0   1
5   1   1   0   1   1   1   1   0   1
6   0   0   0   0   1   0   0   1   0
7   1   0   1   1   1   0   1   1   1
8   1   1   1   0   1   1   1   0   1
9   0   0   0   1   0   1   0   0   0
10  0   0   1   0   0   0   0   0   0
11  1   0   1   0   1   1   0   1   1
12  1   1   0   1   0   1   1   0   1

import pandas as pd
df = pd.read_csv('check.csv')
df['sumA'] = df.a1+df.a2+df.a3
df['sumB'] = df.a4+df.a5+df.a6
df['sumC'] = df.a7+df.a8+df.a9
new_df = df[(df.sumA>1)&(df.sumB>1)&(df.sumC>1)]
new_df = new_df.drop(['sumA','sumB','sumC'],axis=1)