我有一个示例数据集:
import pandas as pd
import numpy as np
d = {
'ID': ['A','B','C','D','E'],
'index_1':[2,0,2,-2,0],
'index_2':[-2,-2,0,0,0],
'index_3':[2,2,2,2,0],
'index_4':[2,2,0,-2,0],
'index_total':[2,2,2,2,2]
}
df = pd.DataFrame(d)
看起来像:
ID index_1 index_2 index_3 index_4 index_total
0 A 2 -2 2 2 2
1 B 0 -2 2 2 2
2 C 2 0 2 0 2
3 D -2 0 2 -2 2
4 E 0 0 0 0 2
我想基于以下条件为每行创建一个名为“flag”的列:
期望的输出:
ID index_1 index_2 index_3 index_4 index_total flag
0 A 2 -2 2 2 2 1
1 B 0 -2 2 2 2 1
2 C 2 0 2 0 2 0
3 D -2 0 2 -2 2 1
4 E 0 0 0 0 2 1
我的尝试(注意我为index_1,index_2,index_3和index_4列名使用循环而不是写出来,因为在我的实际数据集中有超过70个index_列)
第一次尝试:
for colname in df.columns:
if "index_" in colname:
df[colname] = df[colname].astype(int)
#making sure the numbers are all integer for comparison
if ((df[colname] == -2).any() and df['index_total']==2):
df['flag'] = 1
#this doesn't work , it's going by columns not rows
第二次尝试:
for index, row in df.iterrows():
for colname in df.columns:
if "index_" in colname:
if( (df[colname][index] == -2).any() and df['index_total']==2 ):
df['flag'] = 1
# i stopped writing the other conditions because this one doesn't work
答案 0 :(得分:2)
第一个条件:
df[cols].eq(-2).any(1) & df['index_total'].eq(2)
# (array([0, 1, 3], dtype=int64),)
第二个条件:
df[cols].eq(0).all(1) & df['index_total'].eq(2)
# (array([4], dtype=int64),)
np.where
创建新列:
c1 = df[cols].eq(-2).any(1) & df['index_total'].eq(2)
c2 = df[cols].eq(0).all(1) & df['index_total'].eq(2)
df['Flag'] = np.where(c1 | c2, 1, 0)
ID index_1 index_2 index_3 index_4 index_total Flag
0 A 2 -2 2 2 2 1
1 B 0 -2 2 2 2 1
2 C 2 0 2 0 2 0
3 D -2 0 2 -2 2 1
4 E 0 0 0 0 2 1
答案 1 :(得分:2)
any
,all
和布尔屏蔽(内联评论。)
# sub-select your column of interest
i = df.filter(regex=r'index_\d+')
# this is a common mask, we'll compute it once and use later
j = df['index_total'].eq(2)
m1 = i.eq(-2).any(1) & j # first condition
m2 = i.eq(0).all(1) & j # second condition
# compute the union of the masks and convert to int
df['flag'] = (m1 | m2).astype(int)
df
ID index_1 index_2 index_3 index_4 index_total flag
0 A 2 -2 2 2 2 1
1 B 0 -2 2 2 2 1
2 C 2 0 2 0 2 0
3 D -2 0 2 -2 2 1
4 E 0 0 0 0 2 1
答案 2 :(得分:1)
编写一个接受行并执行逻辑的函数:
因为您说您有很多列,我们将使用std lib中的any
和all
。这假定index_total
是最后一列,ID
是第一列
def functo(row):
if (any([i == -2 for i in row[1:-1]]) and row[-1] == 2):
return 1
elif (all(i == 0 for i in row[1:-1]) and row[-1] == 2):
return 1
else:
return 0
并将其应用于您的数据框:
df['flag'] = df.apply(functo, axis=1)
我们使用axis=1
将您的函数应用于行而不是列。
另外,提示:我会避免命名列index
,因为在pandas术语中,索引引用了一行。