遍历列表以构建OR语句

时间:2019-05-23 13:11:25

标签: python pandas dataframe

我有一个函数,该函数采用DataFrame并对由OR连接的特定列执行一系列过滤器。我只需要一列低于96即可通过过滤器。

此代码可以正常工作,但我想改进该功能,以便能够将作为过滤器的list传递给函数,而不是将列硬编码到函数中。


def remove_never_used_focus(drugs, df):
    """ Filters out values above 95 which are
    codes for never used or not answered """

    df = df[
        (df['CAN_060'] < 96) |
#         (df['ALC_30'] < 96) |
        (df['PS_30'] < 96) |
        (df['COC_20'] < 96) |
        (df['HAL_20'] < 96) |
        (df['MET_20'] < 96) |
        (df['XTC_20'] < 96) |
        (df['GLU_20'] < 96) |
        (df['HER_20'] < 96) |
        (df['SAL_20'] < 96) 
        ]

    # this produces and `AND` statement I would like and `OR` statement
    for drug in drugs:
        df = df[(df[drug]) < 96]

    display(df)

    return df

我想到构建此语句的唯一方法是遍历list并逐步构建它。但是,这会生成AND语句。

2 个答案:

答案 0 :(得分:1)

使用DataFrame.any来测试是否在已过滤的列中每行至少有一个值为True

df = pd.DataFrame({
        'A':list('abcdef'),
         'CAN_060':[400,512,4,5,5,400],
         'PS_30':[742,8,9,4,200,300],
         'COC_20':[100,3,5,7,100,100],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df)
   A  CAN_060  PS_30  COC_20  E  F
0  a      400    742     100  5  a
1  b      512      8       3  3  a
2  c        4      9       5  6  a
3  d        5      4       7  9  b
4  e        5    200     100  2  b
5  f      400    300     100  4  b

cols = ['CAN_060','PS_30','COC_20']

print ((df[cols] < 96))
   CAN_060  PS_30  COC_20
0    False  False   False
1    False   True    True
2     True   True    True
3     True   True    True
4     True  False   False
5    False  False   False

df1 = df[(df[cols] < 96).any(axis=1)]
print (df1)
   A  CAN_060  PS_30  COC_20  E  F
1  b      512      8       3  3  a
2  c        4      9       5  6  a
3  d        5      4       7  9  b
4  e        5    200     100  2  b

#for AND for testing if all values per rows are True
df2 = df[(df[cols] < 96).all(axis=1)]
print (df2)
   A  CAN_060  PS_30  COC_20  E  F
2  c        4      9       5  6  a
3  d        5      4       7  9  b

答案 1 :(得分:0)

我认为在您的情况下,您应该尝试使用pandas.eval函数来连接要执行的操作:

operations = ''

for drug in drugs:
        operations = operations + ' | ' + '(df.' + drug + '< 96)'

df = pd.eval(operations)