Question

我有一个示例数据集：

import pandas as pd
import numpy as np

d = {

 'ID': ['A','B','C','D','E'],
 'index_1':[2,0,2,-2,0],
 'index_2':[-2,-2,0,0,0],
 'index_3':[2,2,2,2,0],
 'index_4':[2,2,0,-2,0],
 'index_total':[2,2,2,2,2]
}
df = pd.DataFrame(d)

看起来像：

   ID   index_1  index_2  index_3   index_4   index_total
0   A        2       -2        2        2            2
1   B        0       -2        2        2            2
2   C        2        0        2        0            2
3   D       -2        0        2       -2            2
4   E        0        0        0        0            2

我想基于以下条件为每行创建一个名为“flag”的列：

如果任何列'index_1'，'index_2'，'index_3'，'index_4'包含值-2 AND 'index_total'= 2则flag = 1
如果列'index_1'，'index_2'，'index_3'，'index_4'仅包含值0 AND 'index_total'= 2则flag = 1
else：flag = 0

期望的输出：

    ID   index_1  index_2  index_3   index_4   index_total   flag
0    A        2       -2        2        2            2        1
1    B        0       -2        2        2            2        1
2    C        2        0        2        0            2        0
3    D       -2        0        2       -2            2        1
4    E        0        0        0        0            2        1

我的尝试（注意我为index_1，index_2，index_3和index_4列名使用循环而不是写出来，因为在我的实际数据集中有超过70个index_列）

第一次尝试：

for colname in df.columns:
    if "index_" in colname:
        df[colname] = df[colname].astype(int)  
 #making sure the numbers are all integer for comparison
    if ((df[colname] == -2).any() and df['index_total']==2):
         df['flag'] = 1
  #this doesn't work , it's going by columns not rows

第二次尝试：

 for index, row in df.iterrows():    
    for colname in df.columns:
       if "index_" in colname:
           if( (df[colname][index] == -2).any() and df['index_total']==2 ):
                df['flag'] = 1
 # i stopped writing the other conditions because this one doesn't work

Answer 1

第一个条件：

df[cols].eq(-2).any(1) & df['index_total'].eq(2)

# (array([0, 1, 3], dtype=int64),)

第二个条件：

df[cols].eq(0).all(1) & df['index_total'].eq(2)

# (array([4], dtype=int64),)

np.where 创建新列：

c1 = df[cols].eq(-2).any(1) & df['index_total'].eq(2)
c2 = df[cols].eq(0).all(1) & df['index_total'].eq(2)

df['Flag'] = np.where(c1 | c2, 1, 0)

  ID  index_1  index_2  index_3  index_4  index_total       Flag
0  A        2       -2        2        2            2          1
1  B        0       -2        2        2            2          1
2  C        2        0        2        0            2          0
3  D       -2        0        2       -2            2          1
4  E        0        0        0        0            2          1

Answer 2

`any`，`all`和布尔屏蔽

（内联评论。）

# sub-select your column of interest
i = df.filter(regex=r'index_\d+')
# this is a common mask, we'll compute it once and use later
j = df['index_total'].eq(2)

m1 = i.eq(-2).any(1) & j   # first condition
m2 = i.eq(0).all(1) & j    # second condition
# compute the union of the masks and convert to int
df['flag'] = (m1 | m2).astype(int)

df
  ID  index_1  index_2  index_3  index_4  index_total  flag
0  A        2       -2        2        2            2     1
1  B        0       -2        2        2            2     1
2  C        2        0        2        0            2     0
3  D       -2        0        2       -2            2     1
4  E        0        0        0        0            2     1

Answer 3

编写一个接受行并执行逻辑的函数：

因为您说您有很多列，我们将使用std lib中的any和all。这假定index_total是最后一列，ID是第一列

def functo(row):
    if (any([i == -2 for i in row[1:-1]]) and row[-1] == 2):
        return 1
    elif (all(i == 0 for i in row[1:-1]) and row[-1] == 2):
        return 1
    else:
        return 0

并将其应用于您的数据框：

df['flag'] = df.apply(functo, axis=1)

我们使用axis=1将您的函数应用于行而不是列。

另外，提示：我会避免命名列index，因为在pandas术语中，索引引用了一行。

基于行和列条件的新列pandas python

3 个答案:

`any`，`all`和布尔屏蔽

基于行和列条件的新列pandas python

3 个答案:

any，all和布尔屏蔽

`any`，`all`和布尔屏蔽