我有以下熊猫数据框。
df = pd.DataFrame({'Neighborhood': ['Marble Hill', 'Chelsea', 'Sutton Place'],
'Venue Category': ['Hospital', 'Bridge', 'School']})
执行它时,得到下表。
Neighborhood Venue Category
0 Marble Hill Hospital
1 Chelsea Bridge
2 Sutton Place School
现在,我想为每个场馆类别分配数值。
Hospital - 5 marks
School - 4 marks
Bridge - 2 marks
因此,我尝试使用此代码分配标记。我想在单独的列中显示标记。
def df2(df):
if (df['Venue Category'] == 'Hospital'):
return 5
elif (df['Venue Category'] == 'School'):
return 4
elif (df['Venue Category'] != 'Hospital' or df['Venue Category'] != 'School'):
return np.nan
df['Value'] = df.apply(df2, axis = 1)
一旦执行,它会给我以下警告。我可以知道如何解决这个问题吗?
/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
答案 0 :(得分:1)
为所有可能的Venue Category
创建字典,然后使用Series.map
,如果返回了字典关键字中不存在的列值NaN
:
df = pd.DataFrame({'Neighborhood': ['Marble Hill', 'Chelsea', 'Sutton Place', 'aaa'],
'Venue Category': ['Hospital', 'Bridge', 'School', 'a']})
print (df)
Neighborhood Venue Category
0 Marble Hill Hospital
1 Chelsea Bridge
2 Sutton Place School
3 aaa a
d = {'Hospital':5, 'School':4, 'Bridge':2}
df['Value'] = df['Venue Category'].map(d)
print (df)
Neighborhood Venue Category Value
0 Marble Hill Hospital 5.0
1 Chelsea Bridge 2.0
2 Sutton Place School 4.0
3 aaa a NaN
可以使用np.select
解决方案,但我认为过于复杂:
conditions = [df['Venue Category'] == 'Hospital',
df['Venue Category'] == 'School',
df['Venue Category'] == 'Bridge']
choices = [5,4,3]
df['Value'] = np.select(conditions, choices, default=np.nan)
print (df)
Neighborhood Venue Category Value
0 Marble Hill Hospital 5.0
1 Chelsea Bridge 3.0
2 Sutton Place School 4.0
3 aaa a NaN