索引中按日期显示的多个箱图

时间:2017-07-21 04:49:21

标签: python pandas matplotlib

我的数据框

index   Dates        Hours_played
0       2014-11-06   11
1       2014-12-06   4
2       2015-09-06   5
3       2015-97-06   5

然后,我将Dates设为索引:

             Hours_played
Dates        
2014-11-06   11
2014-12-06   4
2015-09-06   5
2015-97-06   5

问题:当我尝试为索引中找到的每一年创建一个箱形图时,我在同一网格上得到了两个图。

df.loc['2014']['Hours_played'].plot.box(ylim=(0,200))
df.loc['2015']['Hours_played'].plot.box(ylim=(0,200))

Box plot

我尝试了下面的内容,但情节是空的:

data_2015 = df.loc['2015']['Hours_played']
data_2016 = df.loc['2016']['Hours_played']
data_to_plot = [data_2015, data_2016]

mpl_fig = plt.figure()
ax = mpl_fig.add_subplot(111)
ax.boxplot(data_to_plot)
ax.set_ylim(0,300)

boxplot2

是否可以将它们放在同一个网格中,一个接一个?

4 个答案:

答案 0 :(得分:2)

一个简单的解决方案是先按年份分组然后制作boxplot:

import io

import matplotlib.pyplot as plt
import pandas as pd

# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', index_col=0, parse_dates=True)

# The following codes are the answer relevant to your question.
df.groupby(df.index.year).boxplot()
plt.show()

enter image description here

您的第二种方法最终会出现空图,因为matplotlib无法正确识别pandas.DataFrame。尝试使用Numpy-array表示:

import io

import matplotlib.pyplot as plt
import pandas as pd

# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', index_col=0, parse_dates=True)

# The following codes are the answer relevant to your question.    
data_2014 = df[df.index.year == 2014].as_matrix()
data_2015 = df[df.index.year == 2015].as_matrix()
data_to_plot = [data_2014, data_2015]

mpl_fig = plt.figure()
ax = mpl_fig.add_subplot(111)
ax.boxplot(data_to_plot)

plt.show()

enter image description here

要使用子图,您需要逐个绘制它们:

import io

import matplotlib.pyplot as plt
import pandas as pd

# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', parse_dates=[0])
df['Year'] = df.Dates.dt.year
df.set_index(['Year', 'Dates'], inplace=True)

# The following codes are the answer relevant to your question.
mpl_fig = plt.figure()
ax1 = mpl_fig.add_subplot(121)
ax1.boxplot(df.loc[2014]['Hours_played'], labels=[2014])
ax2 = mpl_fig.add_subplot(122)
ax2.boxplot(df.loc[2015]['Hours_played'], labels=[2015])

plt.show()

enter image description here

答案 1 :(得分:2)

让我们按年份列和boxplot

重塑数据
df.set_index(['Dates',df.Dates.dt.year])['Hours_played'].unstack().boxplot()

enter image description here

答案 2 :(得分:1)

如果您想将所有方框放在同一个地块中,您可以这样做:

import matplotlib.pyplot as plt

def setBoxColors(bp, num_plots):
    color = ['red', 'blue', 'green']
    for idx in range(num_plots):
        plt.setp(bp['boxes'][idx],        color=color[idx])
        plt.setp(bp['caps'][2*idx],       color=color[idx])
        plt.setp(bp['caps'][2*idx+1],     color=color[idx])
        plt.setp(bp['whiskers'][2*idx],   color=color[idx])
        plt.setp(bp['whiskers'][2*idx+1], color=color[idx])
        plt.setp(bp['fliers'][2*idx],     color=color[idx])
        plt.setp(bp['fliers'][2*idx+1],   color=color[idx])
        plt.setp(bp['medians'][idx],      color=color[idx])

# Some fake data to plot
A = [[1, 2, 5,]]
B = [[3, 4, 5]]
C = [[1, 7, 10]]

fig = plt.figure()
ax = plt.axes()
plt.hold(True)

bp = plt.boxplot(A, positions = [2], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)

bp = plt.boxplot(B, positions = [6], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)

bp = plt.boxplot(C, positions = [10], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)

# set axes limits and labels
plt.xlim(0,12)
plt.ylim(0,12)
ax.set_xticklabels(['A', 'B', 'C'])
ax.set_xticks([2, 6, 10])

# draw temporary legend
hB, = plt.plot([1,1],'r-')
plt.legend((hB, ),('Type1', ))
hB.set_visible(False)

plt.show()

答案 3 :(得分:0)

在Scott Boston,Y。Luo和yuhow5566的帮助下,我能够设计出一个有趣的答案。来自斯科特,我了解到最好不要为这种类型的箱形图索引日期(保持它们是常规列);从Y. Luo,我学会了如何创建一个新列,同时将年份与日期时间值隔离开来。

df['Year'] = s['Dates'].dt.year

df.boxplot(column='Hours_played', by='Year', figsize=(9,9))

enter image description here