用字符串替换列中的某些值

时间:2019-11-04 22:28:23

标签: python pandas numpy

这是我当前的数据框:

sports_gpa  music_gpa Activity Sport
2            3         nan       nan
0            2         nan       nan
3            3.5       nan       nan
2             1        nan       nan

我有以下情况:

如果'sports_gpa'大于0且'music_gpa'大于'sports_gpa',则在'Activity'栏中填写'sport_gpa',在'Sport'栏中填写str'basketball'

预期输出:

sports_gpa  music_gpa Activity Sport
2            3         2       basketball
0            2         nan       nan
3            3.5       3        basketball 
2            1         nan      nan

为此,我将使用以下语句...

df['Activity'], df['Sport'] = np.where(((df['sports_gpa'] > 0) & (df['music_gpa'] > df['sports_gpa'])), (df['sport_gpa'],'basketball'), (df['Activity'], df['Sport']))

这当然会导致错误,即操作数不能与形状一起广播。

要解决此问题,我可以在数据框中添加一列。

df.loc[:,'str'] = 'basketball'
df['Activity'], df['Sport'] = np.where(((df['sports_gpa'] > 0) & (df['music_gpa'] > df['sports_gpa'])), (df['sport_gpa'],df['str']), (df['Activity'], df['Sport']))

这给了我预期的输出。

我想知道是否有一种方法可以解决此错误,而不必创建新列即可将str值“ basketball”添加到np.where语句的“ Sport”列中。

2 个答案:

答案 0 :(得分:0)

使用np.where + Series.fillna

where=df['sports_gpa'].ne(0)&(df['sports_gpa']<df['music_gpa'])
df['Activity'], df['Sport'] = np.where(where, (df['sports_gpa'],df['Sport'].fillna('basketball')), (df['Activity'], df['Sport']))

您还可以使用Series.where + Series.mask

df['Activity']=df['sports_gpa'].where(where)
df['Sport']=df['Sport'].mask(where,'basketball')
print(df)

   sports_gpa  music_gpa  Activity       Sport
0           2        3.0       2.0  basketball
1           0        2.0       NaN         NaN
2           3        3.5       3.0  basketball
3           2        1.0       NaN         NaN

答案 1 :(得分:0)

只知道我可以做:

   df['Activity'], df['Sport'] = np.where(((df['sports_gpa'] > 0) & (df['music_gpa'] > df['sports_gpa'])), (df['sports_gpa'],df['Sport'].astype(str).replace({"nan": "basketball"})), (df['Activity'], df['Sport']))