我想创建一个IF条件以在新列('new_col')中设置值。一般想法是:
如果“得分” = np.nan和“年份” = 2012:返回1
elif'Score'== np.nan&'Year'= 2013:返回2
否则:返回“分数”
data = {'year': [2010, 2011, 2012, 2013, 2014], 'Score': [10, 15, np.nan, np.nan, 3]}
df = pd.DataFrame(data, columns = ['year', 'Score'])
year Score
0 2010 10.0
1 2011 15.0
2 2012 1.0
3 2013 2.0
4 2014 3.0
答案 0 :(得分:0)
首先需要使用Series.isna
来测试缺失值,然后可以通过Series.eq
来比较==
,并可以通过numpy.select
来设置值:
m1 = df['Score'].isna() & df['year'].eq(2012)
m2 = df['Score'].isna() & df['year'].eq(2013)
df['Score'] = np.select([m1, m2], [1,2], default=df['Score'])
print (df)
year Score
0 2010 10.0
1 2011 15.0
2 2012 1.0
3 2013 2.0
4 2014 3.0
对于新列,请使用:
df['new_col'] = np.select([m1, m2], [1,2], default=df['Score'])
print (df)
year Score new_col
0 2010 10.0 10.0
1 2011 15.0 15.0
2 2012 NaN 1.0
3 2013 NaN 2.0
4 2014 3.0 3.0
答案 1 :(得分:0)
condition_1 = (df['Score'].isnull()) & (df['year'] == 2012)
condition_2 = (df['Score'].isnull()) & (df['year'] == 2013)
values = [1, 2]
df['new_col'] = np.select([condition_1, condition_2], values, df['Score'])
np.select
的语法为:numpy.select(condition_list, choice_list, default_value)
df
year Score new_col
0 2010 10.0 10.0
1 2011 15.0 15.0
2 2012 NaN 1.0
3 2013 NaN 2.0
4 2014 3.0 3.0