我有一个Pandas数据框,其中包含代表年份,年份内的月份和二进制结果(0/1)的列。我想用一个条形图每年绘制一列条形图。我在matplotlib.pyplot中使用了subplots()函数,其中sharex = True,而sharey = True。图表看起来很好,除了y-ticks和y-tick标签之间的填充在最终(底部)图表上是不同的。
可以按如下方式创建示例数据框:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generate dataframe containing dates over several years and a random binary outcome
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range(start = pd.to_datetime('2014-01-01'),end = pd.to_datetime('2017-12-31'))
tempDF['case'] = np.random.choice([0,1],size = [len(tempDF.index)],p = [0.9,0.1])
# Create a dataframe that summarises proportion of cases per calendar month
tempGroupbyCalendarMonthDF = tempDF.groupby([tempDF['date'].dt.year,tempDF['date'].dt.month]).agg({'case': sum,
'date': 'count'})
tempGroupbyCalendarMonthDF.index.names = ['year','month']
tempGroupbyCalendarMonthDF = tempGroupbyCalendarMonthDF.reset_index()
# Rename columns to something more meaningful
tempGroupbyCalendarMonthDF = tempGroupbyCalendarMonthDF.rename(columns = {'case': 'numberCases',
'date': 'monthlyTotal'})
# Calculate percentage positive cases per month
tempGroupbyCalendarMonthDF['percentCases'] = (tempGroupbyCalendarMonthDF['numberCases']/tempGroupbyCalendarMonthDF['monthlyTotal'])*100
最终的数据框如下所示:
year month monthlyTotal numberCases percentCases
0 2014 1 31 5 16.129032
1 2014 2 28 5 17.857143
2 2014 3 31 3 9.677419
3 2014 4 30 1 3.333333
4 2014 5 31 4 12.903226
.. ... ... ... ... ...
43 2017 8 31 2 6.451613
44 2017 9 30 2 6.666667
45 2017 10 31 3 9.677419
46 2017 11 30 2 6.666667
47 2017 12 31 1 3.225806
然后产生如下图所示的图。 subplots()函数用于返回轴数组。代码逐步遍历每个轴并绘制值。 x轴刻度和标签仅显示在最终(底部)图上。最后,获得一个共同的y轴标签,添加一个覆盖所有条形图的附加子图,但不显示所有轴和标签(y轴标签除外)。
# Calculate minimumn and maximum years in dataset
minYear = tempDF['date'].min().year
maxYear = tempDF['date'].max().year
# Set a few parameters
barWidth = 0.80
labelPositionX = 0.872
labelPositionY = 0.60
numberSubplots = maxYear - minYear + 1
fig, axArr = plt.subplots(numberSubplots,figsize = [8,10],sharex = True,sharey = True)
# Keep track of which year to plot, starting with first year in dataset.
currYear = minYear
# Step through each subplot
for ax in axArr:
# Plot the data
rects = ax.bar(tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'month'],
tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'percentCases'],
width = barWidth)
# Format the axes
ax.set_xlim([0.8,13])
ax.set_ylim([0,40])
ax.grid(True)
ax.tick_params(axis = 'both',
left = 'on',
bottom = 'off',
top = 'off',
right = 'off',
direction = 'out',
length = 4,
width = 2,
labelsize = 14)
# Turn on the x-axis ticks and labels for final plot only
if currYear == maxYear:
ax.tick_params(bottom = 'on')
xtickPositions = [1,2,3,4,5,6,7,8,9,10,11,12]
ax.set_xticks([x + barWidth/2 for x in xtickPositions])
ax.set_xticklabels(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
ytickPositions = [10,20,30]
ax.set_yticks(ytickPositions)
# Add label (in this case, the year) to each subplot
# (The transform = ax.transAxes makes positios relative to current axis.)
ax.text(labelPositionX, labelPositionY,currYear,
horizontalalignment = 'center',
verticalalignment = 'center',
transform = ax.transAxes,
family = 'sans-serif',
fontweight = 'bold',
fontsize = 18,
color = 'gray')
# Onto the next year...
currYear = currYear + 1
# Fine-tune overall figure
# ========================
# Make subplots close to each other.
fig.subplots_adjust(hspace=0)
# To display a common y-axis label, create a large axis that covers all the subplots
fig.add_subplot(111, frameon=False)
# Hide tick and tick label of the big axss
plt.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
# Add y label that spans several subplots
plt.ylabel("Percentage cases (%)", fontsize = 16, labelpad = 20)
plt.show()
生成的图形几乎正是我想要的,但底部图表上的y轴刻度标签与轴相比,与其他所有图形相比更远。如果产生的地块数量发生变化(通过使用更宽范围的日期),则会出现相同的模式,即,只有最终的地块看起来不同。
我几乎肯定没有看到树木,但是有人能发现我做错了吗?
修改 上面的代码是在Matplotlib 1.4.3上原始运行的(参见ImportanceOfBeingErnest的评论)。但是,当更新到Matplotlib 2.0.2时,代码无法运行(KeyError:0)。原因似乎是Matplotlib 2.xxx中的默认设置是条形对齐中心。要使上面的代码运行,要么调整x轴范围并勾选位置,以使条形图不超出y轴,或者在绘图函数中设置align ='center',即:
rects = ax.bar(tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'month'],
tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'percentCases'],
width = barWidth,
align = 'edge')