我有500列以上的数据框。 如何在两组列上进行条件比较。
column_list_a = ['A', 'B', ..... 'N] (say 100 columns )
column_list_b = ['O', 'P', ..... 'Z] (say 200 columns)
我尝试过:
df[column_list_a ].lt(df[column_list_b])
但是它不起作用。
目标是使每行column_list_a中的所有列值都小于colum_list_b中的所有列值。
此外,我想将二进制结果存储在新列中。
答案 0 :(得分:0)
确定找到了解决方法:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:instance/*"
],
"Condition": {
"StringEquals": {
"aws:RequestTag/owner": "<KeyValue>"
}
}
},
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"NotResource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:instance/*"
]
},
{
"Effect": "Allow",
"Action": "ec2:CreateVolume",
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestTag/owner": "<KeyValue>"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": [
"CreateVolume",
"RunInstances",
"CreateSnapshot"
]
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:RebootInstances",
"ec2:TerminateInstances",
"ec2:CreateTags",
"ec2:DeleteTags",
"ec2:AttachVolume",
"ec2:DetachVolume",
"ec2:DeleteVolume",
"ec2:DeleteSnapshot"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/owner": "<KeyValue>"
}
}
},
{
"Effect": "Allow",
"Action": "ec2:CreateSnapshot",
"Resource": [
"arn:aws:ec2:*:*:snapshot/*"
],
"Condition": {
"StringEquals": {
"aws:RequestTag/owner": "<KeyValue>"
}
}
},
{
"Effect": "Allow",
"Action": "ec2:CreateSnapshot",
"Resource": [
"arn:aws:ec2:*:*:volume/*"
],
"Condition": {
"StringEquals": {
"ec2:ResourceTag/owner": "<KeyValue>"
}
}
}
]
}
现在我们有两个新列,分别包含100列和200列的值列表
df['combined1'] = other[column_list_a].values.tolist()
df['combined2'] = other[column_list_b].values.tolist()
这可以被概括,而与列数无关。
答案 1 :(得分:0)
您可以将组a的最大值(应该较小的一个)与组b的最小值(应该较大的一个)进行比较。
例如:
res_comp=np.max(df[grp_A], axis=1)<np.min(df[grp_B], axis=1)
整个设置(带有伪数据):
import pandas as pd
import numpy as np
df=pd.DataFrame()
grp_A=list("ABCDEF")
grp_B=list("GHIJKLMNOPRST")
for l in list("ABCDEFGHIJKLMNOPRST"):
if(l in grp_A):
df[l]=np.random.randint(3,7,size=(3000))
else:
df[l]=np.random.randint(5,12,size=(3000))
res_comp=np.max(df[grp_A], axis=1)<np.min(df[grp_B], axis=1)
print(res_comp.value_counts())
输出:
False 2860
True 140
dtype: int64