如何使matplotlib / pandas条形图看起来像直方图?

时间:2016-05-31 14:40:33

标签: python numpy pandas matplotlib plot

绘制barhist

之间的差异

鉴于pandas.Seriesrv中的某些数据,

之间存在差异
  1. 直接在要绘制的数据上调用hist

  2. 计算直方图结果(使用numpy.histogram),然后使用bar

  3. 进行绘图

    示例数据生成

    %matplotlib inline
    
    import numpy as np
    import pandas as pd
    import scipy.stats as stats
    import matplotlib
    matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
    matplotlib.style.use('ggplot')
    
    # Setup size and distribution
    size = 50000
    distribution = stats.norm()
    
    # Create random data
    rv = pd.Series(distribution.rvs(size=size))
    # Get sane start and end points of distribution
    start = distribution.ppf(0.01)
    end = distribution.ppf(0.99)
    
    # Build PDF and turn into pandas Series
    x = np.linspace(start, end, size)
    y = distribution.pdf(x)
    pdf = pd.Series(y, x)
    
    # Get histogram of random data
    y, x = np.histogram(rv, bins=50, normed=True)
    # Correct bin edge placement
    x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
    hist = pd.Series(y, x)
    

    hist()绘图

    ax = pdf.plot(lw=2, label='PDF', legend=True)
    rv.plot(kind='hist', bins=50, normed=True, alpha=0.5, label='Random Samples', legend=True, ax=ax)
    

    hist plotting

    bar()绘图

    ax = pdf.plot(lw=2, label='PDF', legend=True)
    hist.plot(kind='bar', alpha=0.5, label='Random Samples', legend=True, ax=ax)
    

    bar plotting

    如何将bar绘图看起来像hist图?

    此用例需要仅保存要使用的直方图数据并稍后绘图(其大小通常小于原始数据)。

2 个答案:

答案 0 :(得分:8)

条形图差异

获取看起来像bar图的hist图需要对bar的默认行为进行一些操作。

  1. 强制bar通过传递x(hist.index)和y(hist.values)来使用实际x数据绘制范围。 The default bar behavior is to plot the y data against an arbitrary range and put the x data as the label
  2. width参数设置为与x数据的实际步长相关(默认为0.8
  3. align参数设置为'center'
  4. 手动设置轴图例。
  5. 这些更改需要通过matplotlib bar()调用轴(ax)而不是调用pandasbar()来进行数据(hist)。

    示例绘图

    %matplotlib inline
    
    import numpy as np
    import pandas as pd
    import scipy.stats as stats
    import matplotlib
    matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
    matplotlib.style.use('ggplot')
    
    # Setup size and distribution
    size = 50000
    distribution = stats.norm()
    
    # Create random data
    rv = pd.Series(distribution.rvs(size=size))
    # Get sane start and end points of distribution
    start = distribution.ppf(0.01)
    end = distribution.ppf(0.99)
    
    # Build PDF and turn into pandas Series
    x = np.linspace(start, end, size)
    y = distribution.pdf(x)
    pdf = pd.Series(y, x)
    
    # Get histogram of random data
    y, x = np.histogram(rv, bins=50, normed=True)
    # Correct bin edge placement
    x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
    hist = pd.Series(y, x)
    
    # Plot previously histogrammed data
    ax = pdf.plot(lw=2, label='PDF', legend=True)
    w = abs(hist.index[1]) - abs(hist.index[0])
    ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
    ax.legend(['PDF', 'Random Samples'])
    

    histogrammed plot

答案 1 :(得分:1)

另一个更简单的解决方案是创建重现相同直方图的假样本,然后简单地使用hist()。

即,从存储的数据中检索binscounts后,执行

fake = np.array([])
for i in range(len(counts)):
    a, b = bins[i], bins[i+1]
    sample = a + (b-a)*np.random.rand(counts[i])
    fake = np.append(fake, sample)

plt.hist(fake, bins=bins)