Question

我有500列以上的数据框。如何在两组列上进行条件比较。

column_list_a = ['A', 'B', ..... 'N]  (say 100 columns )
column_list_b = ['O', 'P', ..... 'Z] (say 200 columns)

我尝试过：

df[column_list_a ].lt(df[column_list_b])

但是它不起作用。

目标是使每行column_list_a中的所有列值都小于colum_list_b中的所有列值。

此外，我想将二进制结果存储在新列中。

Answer 1

确定找到了解决方法：

      {
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "ec2:DescribeInstances"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": "ec2:RunInstances",
        "Resource": [
            "arn:aws:ec2:*:*:volume/*",
            "arn:aws:ec2:*:*:instance/*"
        ],
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": "ec2:RunInstances",
        "NotResource": [
            "arn:aws:ec2:*:*:volume/*",
            "arn:aws:ec2:*:*:instance/*"
        ]
    },
    {
        "Effect": "Allow",
        "Action": "ec2:CreateVolume",
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": [
            "ec2:CreateTags"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "ec2:CreateAction": [
                    "CreateVolume",
                    "RunInstances",
                    "CreateSnapshot"
                ]
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": [
            "ec2:StartInstances",
            "ec2:StopInstances",
            "ec2:RebootInstances",
            "ec2:TerminateInstances",
            "ec2:CreateTags",
            "ec2:DeleteTags",
            "ec2:AttachVolume",
            "ec2:DetachVolume",
            "ec2:DeleteVolume",
            "ec2:DeleteSnapshot"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "ec2:ResourceTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": "ec2:CreateSnapshot",
        "Resource": [
            "arn:aws:ec2:*:*:snapshot/*"
        ],
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/owner": "<KeyValue>"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": "ec2:CreateSnapshot",
        "Resource": [
            "arn:aws:ec2:*:*:volume/*"
        ],
        "Condition": {
            "StringEquals": {
                "ec2:ResourceTag/owner": "<KeyValue>"
            }
        }
    }
          ]
          }

现在我们有两个新列，分别包含100列和200列的值列表

 df['combined1'] = other[column_list_a].values.tolist()
 df['combined2'] = other[column_list_b].values.tolist()

这可以被概括，而与列数无关。

Answer 2

您可以将组a的最大值（应该较小的一个）与组b的最小值（应该较大的一个）进行比较。

例如：

res_comp=np.max(df[grp_A], axis=1)<np.min(df[grp_B], axis=1)

整个设置（带有伪数据）：

import pandas as pd
import numpy as np

df=pd.DataFrame()
grp_A=list("ABCDEF")
grp_B=list("GHIJKLMNOPRST")

for l in list("ABCDEFGHIJKLMNOPRST"):
    if(l in grp_A):
        df[l]=np.random.randint(3,7,size=(3000))
    else:
        df[l]=np.random.randint(5,12,size=(3000))


res_comp=np.max(df[grp_A], axis=1)<np.min(df[grp_B], axis=1)

print(res_comp.value_counts())

输出：

False    2860
True      140
dtype: int64

熊猫比较多列

2 个答案: