我想在Python中完成this one个R用户的类似问题。我的意图是创建一个新列,其值是根据其他列中的条件创建的
例如:
d = {'year': [2010, 2011,2013, 2014], 'PD': [0.5, 0.8, 0.9, np.nan], 'PD_thresh': [0.7, 0.8, 0.9, 0.7]}
df_temp = pd.DataFrame(data=d)
现在,我想创建一个条件:
伪代码:
if for year X the value of PD is greater or equal to the value of PD_thresh
then set 0 in a new column y_pseudo
otherwise set 1
我的预期结果是:
df_temp
Out[57]:
year PD PD_thresh y_pseudo
0 2010 0.5 0.7 0.0
1 2011 0.6 0.7 0.0
2 2013 0.9 0.8 1.0
3 2014 NaN 0.7 NaN
答案 0 :(得分:4)
将numpy.select
与isna
和ge
结合使用:
m1 = df_temp['PD'].isna()
m2 = df_temp['PD'].ge(df_temp['PD_thresh'])
df_temp['y_pseudo'] = np.select([m1, m2], [np.nan, 1], default=0)
print (df_temp)
year PD PD_thresh y_pseudo
0 2010 0.5 0.7 0.0
1 2011 0.6 0.8 0.0
2 2013 0.9 0.9 1.0
3 2014 NaN 0.7 NaN
另一种解决方案是将True/False
到1/0
映射的掩码转换为整数,并通过notna
仅设置不丢失的行:
m2 = df_temp['PD'].ge(df_temp['PD_thresh'])
m3 = df_temp['PD'].notna()
df_temp.loc[m3, 'y_pseudo'] = m2[m3].astype(int)
print (df_temp)
year PD PD_thresh y_pseudo
0 2010 0.5 0.7 0.0
1 2011 0.6 0.8 0.0
2 2013 0.9 0.9 1.0
3 2014 NaN 0.7 NaN
答案 1 :(得分:0)
您的数据d与您的结果不同,我认为您的意思是如果大于阈值则为1,而不是相反,所以我这样说:
mu_fit <- nls(mu ~ mu_fun(a,d,b), start = list(a = start_a, b = start_b))
输出:
y = [a if np.isnan(a) else 1 if a>=b else 0 for a,b in zip(df_temp.PD,df_temp.PD_thresh)]
df_temp['y_pseudo'] = y