如何将列表理解应用于数据框中的多个列?

时间:2019-07-15 14:12:53

标签: string list dataframe comparison

我正在基于该行中其他列的值在数据框中设置新的0和1列。如果满足以下任一条件,则该值应等于1,否则等于0:

y_train['SEPSISPATOS']=='Yes' OR
y_train['SEPSHOCKPATOS'] == 'Yes' OR 
y_train['OTHSYSEP'] == 'Sepsis' OR
y_train['OTHSESHOCK'] == 'Septic Shock' 

我尝试使用列表推导和np.select(下面的代码)

NSQIPdf_train = pd.read_csv("acs_nsqip_puf13_2.csv",sep=',',encoding='utf-8')
y_train = NSQIPdf_train.loc[:,('SEPSISPATOS','SEPSHOCKPATOS', 'OTHSYSEP', 'OTHSESHOCK')]

### trying list comprehension
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]

### trying np.select
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]
conditions=[
    (y_train['SEPSISPATOS'] == 'Yes'),
    (y_train['SEPSHOCKPATOS'] == 'Yes'),
    (y_train['OTHSYSEP'] == 'Sepsis'),
    (y_train['OTHSESHOCK'] == 'Septic Shock')]
choices=[1,1,1,1]
y_train['SEPSIS_STATUS'] = np.select(conditions,choices,default=0)

print (y_train)
print (y_train.dtypes)

使用np.select,您可以看到第3行中的OTHSESHOCK ='Septic Shock',SEPSIS_STATUS仍为0,我期望的是1。字符串比较似乎不起作用(下面的示例输出-我想知道如果这是因为列的dtype是'object'是因为熊猫如何读取csv文件而不是字符串中的内容

       SEPSISPATOS SEPSHOCKPATOS  ...          OTHSESHOCK SEPSIS_STATUS
0            b'No'         b'No'  ...  b'No Complication'             0
1            b'No'         b'No'  ...  b'No Complication'             0
2            b'No'         b'No'  ...  b'No Complication'             0
3            b'No'         b'No'  ...     b'Septic Shock'             0
4            b'No'         b'No'  ...  b'No Complication'             0
5            b'No'         b'No'  ...  b'No Complication'             0
6            b'No'         b'No'  ...  b'No Complication'             0
7            b'No'         b'No'  ...  b'No Complication'             0
8            b'No'         b'No'  ...  b'No Complication'             0

使用列表理解时,出现以下错误:

AttributeError: 'DataFrame' object has no attribute 'str'.

最后,这是使用print(df.dtypes)时变量的dtypes

SEPSISPATOS      object
SEPSHOCKPATOS    object
OTHSYSEP         object
OTHSESHOCK       object
SEPSIS_STATUS     int32
dtype: object

非常感谢帮助。我已经尝试了多种方法,但是愿意接受其他建议或修正。谢谢!

1 个答案:

答案 0 :(得分:0)

尝试将列强制转换为字符串。不确定数据框的名称是什么,但是类似下面的内容应该起作用。

df.SEPSIS_SHOCK = df.SEPSIS_SHOCK.astype(str)