如何为水平棒棒糖图/哑铃图选择最小值和最大值?

时间:2019-05-31 15:00:17

标签: python pandas numpy matplotlib seaborn

我创建了一个哑铃图,但是每种类别类型的最小值和最大值都太多了。我只想在每个区域中显示一个天蓝色点(最低价格)和一个绿色点(最高价格)。

这是到目前为止的图表:

My dumbbell chart

这是我的DataFrame:

The DataFrame

这里是完整数据集的链接:

https://drive.google.com/open?id=1PpI6PlO8ox2vKfM4aGmEUexCPPWa59S_

这是到目前为止的代码:

   import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns

    db = df[['minPrice','maxPrice', 'neighbourhood_hosts']]
    ordered_db = db.sort_values(by='minPrice')
    my_range=db['neighbourhood_hosts']

    plt.figure(figsize=(8,6))
    plt.hlines(y=my_range, xmin=ordered_db['minPrice'], xmax=ordered_db['maxPrice'], color='grey', alpha=0.4)
    plt.scatter(ordered_db['minPrice'], my_range, color='skyblue', alpha=1, label='minimum price')
    plt.scatter(ordered_db['maxPrice'], my_range, color='green', alpha=0.4 , label='maximum price')
    plt.legend()


    plt.title("Comparison of the minimum and maximum prices")
    plt.xlabel('Value range')
    plt.ylabel('Area')

如何格式化我的代码,以便每个区域只有一个最小值和一个最大值?

1 个答案:

答案 0 :(得分:1)

根据每次会话,以下是脚本:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('dumbbell data.csv')
db = df[['minPrice','maxPrice', 'neighbourhood_hosts']]
#create max and min price based on area name
max_price = db.groupby(['neighbourhood_hosts'])['maxPrice'].max().reset_index()
min_price = db.groupby(['neighbourhood_hosts'])['minPrice'].min().reset_index()
var_price = pd.DataFrame()
var_price['range'] = max_price.maxPrice-min_price.minPrice
var_price['neighbourhood_hosts'] = min_price['neighbourhood_hosts']
var_price = var_price.sort_values(by='range')

#sort max and min price according to variance
max_price = max_price.reindex(var_price.index)
min_price = min_price.reindex(var_price.index)

plt.figure(figsize=(8,6))
plt.hlines(y=min_price['neighbourhood_hosts'], xmin=min_price['minPrice'], xmax=max_price['maxPrice'], color='grey', alpha=0.4)
plt.scatter(min_price['minPrice'], min_price['neighbourhood_hosts'], color='skyblue', alpha=1, label='minimum price')
plt.scatter(max_price['maxPrice'], max_price['neighbourhood_hosts'], color='green', alpha=0.4 , label='maximum price')
plt.legend()


plt.title("Comparison of the minimum and maximum prices")
plt.xlabel('Value range')
plt.ylabel('Area')

enter image description here