bar
和hist
鉴于pandas.Series
,rv
中的某些数据,
直接在要绘制的数据上调用hist
计算直方图结果(使用numpy.histogram
),然后使用bar
%matplotlib inline
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')
# Setup size and distribution
size = 50000
distribution = stats.norm()
# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)
# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
hist()
绘图ax = pdf.plot(lw=2, label='PDF', legend=True)
rv.plot(kind='hist', bins=50, normed=True, alpha=0.5, label='Random Samples', legend=True, ax=ax)
bar()
绘图ax = pdf.plot(lw=2, label='PDF', legend=True)
hist.plot(kind='bar', alpha=0.5, label='Random Samples', legend=True, ax=ax)
bar
绘图看起来像hist
图?此用例需要仅保存要使用的直方图数据并稍后绘图(其大小通常小于原始数据)。
答案 0 :(得分:8)
获取看起来像bar
图的hist
图需要对bar
的默认行为进行一些操作。
bar
通过传递x(hist.index
)和y(hist.values
)来使用实际x数据绘制范围。 The default bar
behavior is to plot the y data against an arbitrary range and put the x data as the label。width
参数设置为与x数据的实际步长相关(默认为0.8
)align
参数设置为'center'
。这些更改需要通过matplotlib
bar()
调用轴(ax
)而不是调用pandas
的bar()
来进行数据(hist
)。
%matplotlib inline
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')
# Setup size and distribution
size = 50000
distribution = stats.norm()
# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)
# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
# Plot previously histogrammed data
ax = pdf.plot(lw=2, label='PDF', legend=True)
w = abs(hist.index[1]) - abs(hist.index[0])
ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
ax.legend(['PDF', 'Random Samples'])
答案 1 :(得分:1)
另一个更简单的解决方案是创建重现相同直方图的假样本,然后简单地使用hist()。
即,从存储的数据中检索bins
和counts
后,执行
fake = np.array([])
for i in range(len(counts)):
a, b = bins[i], bins[i+1]
sample = a + (b-a)*np.random.rand(counts[i])
fake = np.append(fake, sample)
plt.hist(fake, bins=bins)