我有一个函数,该函数采用DataFrame并对由OR
连接的特定列执行一系列过滤器。我只需要一列低于96即可通过过滤器。
此代码可以正常工作,但我想改进该功能,以便能够将作为过滤器的list
传递给函数,而不是将列硬编码到函数中。
def remove_never_used_focus(drugs, df):
""" Filters out values above 95 which are
codes for never used or not answered """
df = df[
(df['CAN_060'] < 96) |
# (df['ALC_30'] < 96) |
(df['PS_30'] < 96) |
(df['COC_20'] < 96) |
(df['HAL_20'] < 96) |
(df['MET_20'] < 96) |
(df['XTC_20'] < 96) |
(df['GLU_20'] < 96) |
(df['HER_20'] < 96) |
(df['SAL_20'] < 96)
]
# this produces and `AND` statement I would like and `OR` statement
for drug in drugs:
df = df[(df[drug]) < 96]
display(df)
return df
我想到构建此语句的唯一方法是遍历list
并逐步构建它。但是,这会生成AND
语句。
答案 0 :(得分:1)
使用DataFrame.any
来测试是否在已过滤的列中每行至少有一个值为True
:
df = pd.DataFrame({
'A':list('abcdef'),
'CAN_060':[400,512,4,5,5,400],
'PS_30':[742,8,9,4,200,300],
'COC_20':[100,3,5,7,100,100],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
print (df)
A CAN_060 PS_30 COC_20 E F
0 a 400 742 100 5 a
1 b 512 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 200 100 2 b
5 f 400 300 100 4 b
cols = ['CAN_060','PS_30','COC_20']
print ((df[cols] < 96))
CAN_060 PS_30 COC_20
0 False False False
1 False True True
2 True True True
3 True True True
4 True False False
5 False False False
df1 = df[(df[cols] < 96).any(axis=1)]
print (df1)
A CAN_060 PS_30 COC_20 E F
1 b 512 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 200 100 2 b
#for AND for testing if all values per rows are True
df2 = df[(df[cols] < 96).all(axis=1)]
print (df2)
A CAN_060 PS_30 COC_20 E F
2 c 4 9 5 6 a
3 d 5 4 7 9 b
答案 1 :(得分:0)
我认为在您的情况下,您应该尝试使用pandas.eval函数来连接要执行的操作:
operations = ''
for drug in drugs:
operations = operations + ' | ' + '(df.' + drug + '< 96)'
df = pd.eval(operations)