我在python中有以下case语句,
pd_df['difficulty'] = 'Unknown'
pd_df['difficulty'][(pd_df['Time']<30) & (pd_df['Time']>0)] = 'Easy'
pd_df['difficulty'][(pd_df['Time']>=30) & (pd_df['Time']<=60)] = 'Meduim'
pd_df['difficulty'][pd_df['Time']>60] = 'Hard'
但是当我运行代码时,它会抛出一个错误。
A value is trying to be set on a copy of a slice from a DataFrame
答案 0 :(得分:6)
选项1
为了提高性能,请使用嵌套的np.where
条件。对于条件,您可以使用pd.Series.between
,并相应地插入默认值。
pd_df['difficulty'] = np.where(
pd_df['Time'].between(0, 30, inclusive=False),
'Easy',
np.where(
pd_df['Time'].between(0, 30, inclusive=False), 'Medium', 'Unknown'
)
)
选项2
同样,使用np.select
,这为添加条件提供了更多空间:
pd_df['difficulty'] = np.select(
[
pd_df['Time'].between(0, 30, inclusive=False),
pd_df['Time'].between(30, 60, inclusive=True)
],
[
'Easy',
'Medium'
],
default='Unknown'
)
选项3
另一个高性能解决方案涉及loc
:
pd_df['difficulty'] = 'Unknown'
pd_df.loc[pd_df['Time'].between(0, 30, inclusive=False), 'difficulty'] = 'Easy'
pd_df.loc[pd_df['Time'].between(30, 60, inclusive=True), 'difficulty'] = 'Medium'