我有一个类型为numpy的变量列表。我想用seaborn将它们装箱成一个图。
subscribers=bankData.loc[bankData['deposit']==1] # Only who subscribe in term deposition
occupations=bankData['job'].unique().tolist()
admin=subscribers['age'].loc[subscribers['job']=='admin.'].values
technician=subscribers['age'].loc[subscribers['job']=='technician'].values
services=subscribers['age'].loc[subscribers['job']=='services'].values
management=subscribers['age'].loc[subscribers['job']=='management'].values
retired=subscribers['age'].loc[subscribers['job']=='retired'].values
blue_collar=subscribers['age'].loc[subscribers['job']=='blue-collar'].values
unemployed=subscribers['age'].loc[subscribers['job']=='unemployed'].values
enterpreneur=subscribers['age'].loc[subscribers['job']=='enterpreneur'].values
housemaid=subscribers['age'].loc[subscribers['job']=='housemaid'].values
unknown= subscribers['age'].loc[subscribers['job']=='unknown'].values
self_employed=subscribers['age'].loc[subscribers['job']=='self-employed'].values
student=subscribers['age'].loc[subscribers['job']=='student'].values
occpuation_age=[admin, technician,services, management, retired, blue_collar, unemployed, enterpreneur, housemaid,
unknown, self_employed, student]
我希望每个箱形图在occpuation_age中都显示一个项目。
答案 0 :(得分:1)
无需将数据帧拆分为单独的numpy数组,只需在seaborn图中传递变量名称即可:
sns.boxplot(x='job', y='age', data=subscribers)
要演示随机种子数据:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(682019)
occupations = ['admin', 'technician', 'management', 'retired', 'blue_collar',
'unemployed', 'enterpreneur', 'housemaid',
'unknown', 'self_employed', 'student']
subscribers = pd.DataFrame({'job': np.random.choice(occupations, 100),
'age': np.random.uniform(0, 100, 100)})
print(subscribers.head(10))
# job age
# 0 technician 2.188924
# 1 blue_collar 40.868834
# 2 management 44.179859
# 3 technician 72.193644
# 4 enterpreneur 83.680639
# 5 enterpreneur 60.923324
# 6 student 99.163055
# 7 management 80.392648
# 8 unknown 96.985044
# 9 self_employed 92.147679
fig, ax = plt.subplots(figsize=(14,5))
sns.boxplot(y='age', x='job', data=subscribers, ax=ax)
plt.show()
plt.clf()
plt.close()
要按年龄中位数进行降序排序,请在所需的汇总列中添加groupby().transform()
,然后在此列中进行排序:
subscribers['job_mean'] = subscribers.groupby('job')['age'].transform('median')
subscribers = subscribers.sort_values('job_mean', ascending=False)
fig, ax = plt.subplots(figsize=(14,5))
sns.boxplot(y='age', x='job', data=subscribers, ax=ax)
plt.show()
plt.clf()
plt.close()