熊猫比较多列

时间:2020-02-05 15:52:10

标签: python-3.x pandas dataframe

我有500列以上的数据框。 如何在两组列上进行条件比较。

column_list_a = ['A', 'B', ..... 'N]  (say 100 columns )
column_list_b = ['O', 'P', ..... 'Z] (say 200 columns)

我尝试过:

df[column_list_a ].lt(df[column_list_b]) 

但是它不起作用。

目标是使每行column_list_a中的所有列值都小于colum_list_b中的所有列值。

此外,我想将二进制结果存储在新列中。

2 个答案:

答案 0 :(得分:0)

确定找到了解决方法:

      {
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "ec2:DescribeInstances"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": "ec2:RunInstances",
        "Resource": [
            "arn:aws:ec2:*:*:volume/*",
            "arn:aws:ec2:*:*:instance/*"
        ],
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": "ec2:RunInstances",
        "NotResource": [
            "arn:aws:ec2:*:*:volume/*",
            "arn:aws:ec2:*:*:instance/*"
        ]
    },
    {
        "Effect": "Allow",
        "Action": "ec2:CreateVolume",
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": [
            "ec2:CreateTags"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "ec2:CreateAction": [
                    "CreateVolume",
                    "RunInstances",
                    "CreateSnapshot"
                ]
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": [
            "ec2:StartInstances",
            "ec2:StopInstances",
            "ec2:RebootInstances",
            "ec2:TerminateInstances",
            "ec2:CreateTags",
            "ec2:DeleteTags",
            "ec2:AttachVolume",
            "ec2:DetachVolume",
            "ec2:DeleteVolume",
            "ec2:DeleteSnapshot"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "ec2:ResourceTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": "ec2:CreateSnapshot",
        "Resource": [
            "arn:aws:ec2:*:*:snapshot/*"
        ],
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": "ec2:CreateSnapshot",
        "Resource": [
            "arn:aws:ec2:*:*:volume/*"
        ],
        "Condition": {
            "StringEquals": {
                "ec2:ResourceTag/owner": "<KeyValue>"
            }
        }
    }
          ]
          }

现在我们有两个新列,分别包含100列和200列的值列表

 df['combined1'] = other[column_list_a].values.tolist()
 df['combined2'] = other[column_list_b].values.tolist()

这可以被概括,而与列数无关。

答案 1 :(得分:0)

您可以将组a的最大值(应该较小的一个)与组b的最小值(应该较大的一个)进行比较。

例如:

res_comp=np.max(df[grp_A], axis=1)<np.min(df[grp_B], axis=1)

整个设置(带有伪数据):

import pandas as pd
import numpy as np

df=pd.DataFrame()
grp_A=list("ABCDEF")
grp_B=list("GHIJKLMNOPRST")

for l in list("ABCDEFGHIJKLMNOPRST"):
    if(l in grp_A):
        df[l]=np.random.randint(3,7,size=(3000))
    else:
        df[l]=np.random.randint(5,12,size=(3000))


res_comp=np.max(df[grp_A], axis=1)<np.min(df[grp_B], axis=1)

print(res_comp.value_counts())

输出:

False    2860
True      140
dtype: int64