如何获取箱图中每个中位数的值?

时间:2019-10-07 04:27:59

标签: python pandas boxplot

数据集来自enter image description here

此代码

melbourne_file_path = './melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
filtered_melbourne_data = melbourne_data.dropna(axis=0)
ax = filtered_melbourne_data.boxplot(column = 'Price', by = 'Regionname');

给出该箱线图

kaggle

箱线图已经有很多信息,例如中位数,有没有办法使它们与by相对应?

我尝试了根据enter image description here

改编的这段代码
ax, bp = filtered_melbourne_data.boxplot(column = 'Price', by = 'Regionname', return_type='both');

并收到此错误

ValueError: not enough values to unpack (expected 2, got 1)

我也尝试过根据该帖子改编的这段代码。

ax = filtered_melbourne_data.boxplot(column = 'Price', by = 'Regionname', return_type='both');
print(ax.median)

得到

<bound method Series.median of Price    (AxesSubplot(0.1,0.15;0.8x0.75), {'whiskers': ...
dtype: object>

如何获取每个Regionname的中值?

1 个答案:

答案 0 :(得分:2)

可能,但需要在发布后对解决方案进行一些更改:

首先将['Price']添加到一个元素Series中以获取值:

ax, bp = filtered_melbourne_data.boxplot(column = 'Price', 
                                         by = 'Regionname', 
                                         return_type='both')['Price']

然后通过索引-[0]获得数组的第一个值:

medians = [median.get_ydata()[0] for median in bp["medians"]]
print (medians)
[990000.0, 670000.0, 715000.0, 590000.0, 780000.0, 1230000.0, 700000.0, 400000.0]