我正在基于该行中其他列的值在数据框中设置新的0和1列。如果满足以下任一条件,则该值应等于1,否则等于0:
y_train['SEPSISPATOS']=='Yes' OR
y_train['SEPSHOCKPATOS'] == 'Yes' OR
y_train['OTHSYSEP'] == 'Sepsis' OR
y_train['OTHSESHOCK'] == 'Septic Shock'
我尝试使用列表推导和np.select(下面的代码)
NSQIPdf_train = pd.read_csv("acs_nsqip_puf13_2.csv",sep=',',encoding='utf-8')
y_train = NSQIPdf_train.loc[:,('SEPSISPATOS','SEPSHOCKPATOS', 'OTHSYSEP', 'OTHSESHOCK')]
### trying list comprehension
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]
### trying np.select
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]
conditions=[
(y_train['SEPSISPATOS'] == 'Yes'),
(y_train['SEPSHOCKPATOS'] == 'Yes'),
(y_train['OTHSYSEP'] == 'Sepsis'),
(y_train['OTHSESHOCK'] == 'Septic Shock')]
choices=[1,1,1,1]
y_train['SEPSIS_STATUS'] = np.select(conditions,choices,default=0)
print (y_train)
print (y_train.dtypes)
使用np.select,您可以看到第3行中的OTHSESHOCK ='Septic Shock',SEPSIS_STATUS仍为0,我期望的是1。字符串比较似乎不起作用(下面的示例输出-我想知道如果这是因为列的dtype是'object'是因为熊猫如何读取csv文件而不是字符串中的内容
SEPSISPATOS SEPSHOCKPATOS ... OTHSESHOCK SEPSIS_STATUS
0 b'No' b'No' ... b'No Complication' 0
1 b'No' b'No' ... b'No Complication' 0
2 b'No' b'No' ... b'No Complication' 0
3 b'No' b'No' ... b'Septic Shock' 0
4 b'No' b'No' ... b'No Complication' 0
5 b'No' b'No' ... b'No Complication' 0
6 b'No' b'No' ... b'No Complication' 0
7 b'No' b'No' ... b'No Complication' 0
8 b'No' b'No' ... b'No Complication' 0
使用列表理解时,出现以下错误:
AttributeError: 'DataFrame' object has no attribute 'str'.
最后,这是使用print(df.dtypes)时变量的dtypes
SEPSISPATOS object
SEPSHOCKPATOS object
OTHSYSEP object
OTHSESHOCK object
SEPSIS_STATUS int32
dtype: object
非常感谢帮助。我已经尝试了多种方法,但是愿意接受其他建议或修正。谢谢!
答案 0 :(得分:0)
尝试将列强制转换为字符串。不确定数据框的名称是什么,但是类似下面的内容应该起作用。
df.SEPSIS_SHOCK = df.SEPSIS_SHOCK.astype(str)