如何指定要在x轴上绘制的离散值(matplotlib,boxplot)?

时间:2017-01-27 11:14:18

标签: python matplotlib axis boxplot

我在matplotlib(Python)中使用boxplot来创建箱形图,我创建了许多具有不同日期的图形。在x轴上,数据是离散的。

x轴上的值(以秒为单位)为0.25,0.5,1,2,5 ...... 28800.这些值是任意选择的(它们是采样周期)。在某些图表上,由于数据不可用,缺少一个或两个值。在这些图表上,x轴调整自身大小以展开其他值。

我希望所有图形在x轴上的相同位置具有相同的值(如果x轴显示值但图表上没有绘制数据则无关紧要)

有人能告诉我是否有办法指定x轴值?或者将相同的值保存在同一个地方的另一种方法。

相关的代码部分如下:

表示i,myDataframe.groupby(“日期”)中的分组:

    graphFilename = (basename+'_' + str(i) + '.png')
    plt.figure(graphFilename)
    group.boxplot(by=["SamplePeriod_seconds"], sym='g+') ## colour = 'blue'
    plt.grid(True)
    axes = plt.gca()
    axes.set_ylim([0,30000])
    plt.ylabel('Average distance (m)', fontsize =8)
    plt.xlabel('GPS sample interval (s)', fontsize=8)
    plt.tick_params(axis='x', which='major', labelsize=8)
    plt.tick_params(axis='y', which='major', labelsize=8)
    plt.xticks(rotation=90)
    plt.title(str(i) + ' - ' + 'Average distance travelled by cattle over 24  hour period', fontsize=9) 
    plt.suptitle('')
    plt.savefig(graphFilename)
    plt.close()     

任何帮助表示赞赏,我将继续谷歌搜索...谢谢你。

3 个答案:

答案 0 :(得分:1)

如果你尝试这样的话:

plt.xticks(np.arange(x.min(), x.max(), 5))

其中x是x值的数组,5是沿轴采取的步骤。

同样适用于具有yticks的y轴。希望能帮助到你! :)

编辑:

我删除了我没有的实例,但是下面的代码应该为你提供一个网格来绘制:

import matplotlib.pyplot as plt
import numpy as np


plt.grid(True)
axes = plt.gca()
axes.set_ylim([0, 30000])
plt.ylabel('Average distance (m)', fontsize=8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle('')
my_xticks =[0.25,0.5,1,2,5,10,20,30,60,120,300,600,1200,1800,2400,3‌000,3600,7200,10800,‌​ 14400,18000,21600,25‌​200,28800]
x = np.array(np.arange(0, len(my_xticks), 1))

plt.xticks(x, my_ticks)
plt.show()

尝试在此基础上插入您的值:)

答案 1 :(得分:1)

默认情况下,boxplot只是将可用数据绘制到轴上的连续位置。遗漏了数据,仅仅因为箱线图并不知道它们丢失了。但是,可以使用positions参数手动设置框的位置。 以下示例执行此操作,从而即使在缺少值时也会生成相等范围的图。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


basename = __file__+"_plot"
Nd = 4 # four different dates
Ns = 5 # five second intervals
N = 80 # each 80 values
date = []
seconds = []
avgdist = []
# fill lists
for i in range(Nd):
    # for each date, select a random SamplePeriod to be not part of the dataframe
    w = np.random.randint(0,5)
    for j in range(Ns):
        if j!=w:
            av = np.random.poisson(1.36+j/10., N)*4000+1000
            avgdist.append(av) 
            seconds.append([j]*N)
            date.append([i]*N)

date = np.array(date).flatten()
seconds = np.array(seconds).flatten()
avgdist = np.array(avgdist).flatten()
#put data into DataFrame
myDataframe = pd.DataFrame({"Date" : date, "SamplePeriod_seconds" : seconds, "avgdist" : avgdist}) 
# obtain a list of all possible Sampleperiods
globalunique = np.sort(myDataframe["SamplePeriod_seconds"].unique())

for i, group in myDataframe.groupby("Date"):

    graphFilename = (basename+'_' + str(i) + '.png')
    fig = plt.figure(graphFilename, figsize=(6,3))
    axes = fig.add_subplot(111)
    plt.grid(True)

    # omit the `dates` column
    dfgroup = group[["SamplePeriod_seconds", "avgdist"]]
    # obtain a list of Sampleperiods for this date
    unique = np.sort(dfgroup["SamplePeriod_seconds"].unique())
    # plot the boxes to the axes, one for each sample periods in dfgroup
    # set the boxes' positions to the values in unique
    dfgroup.boxplot(by=["SamplePeriod_seconds"], sym='g+', positions=unique, ax=axes)

    # set xticks to the unique positions, where boxes are
    axes.set_xticks(unique)
    # make sure all plots share the same extent.
    axes.set_xlim([-0.5,globalunique[-1]+0.5])
    axes.set_ylim([0,30000])

    plt.ylabel('Average distance (m)', fontsize =8)
    plt.xlabel('GPS sample interval (s)', fontsize=8)
    plt.tick_params(axis='x', which='major', labelsize=8)
    plt.tick_params(axis='y', which='major', labelsize=8)
    plt.xticks(rotation=90)
    plt.suptitle(str(i) + ' - ' + 'Average distance travelled by cattle over 24  hour period', fontsize=9) 
    plt.title("")
    plt.savefig(graphFilename)
    plt.close()    

enter image description here
enter image description here

如果SamplePeriod_seconds列中的值不等距,但仍然有效,但当然如果它们非常不同,这将不会产生很好的结果,因为条形将重叠:

enter image description here

然而,这不是绘图本身的问题。为了获得进一步的帮助,我们需要知道您最终期望图表的样子。

答案 2 :(得分:0)

非常感谢大家的帮助,使用您的答案,我使用以下代码。 (我意识到它可能会有所改进,但很高兴它有效我现在可以查看数据:))

valuesShouldPlot = ['0.25','0.5','1.0','2.0','5.0','10.0','20.0','30.0','60.0','120.0','300.0','600.0','1200.0','1800.0','2400.0','3000.0','3600.0','7200.0','10800.0','14400.0','18000.0','21600.0','25200.0','28800.0']       


for xDate, group in myDataframe.groupby("Date"):            ## for each date

    graphFilename = (basename+'_' + str(xDate) + '.png')    ## make up a suitable filename for the graph

    plt.figure(graphFilename)

    group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both')  ## create box plot, (boxplots are placed in default positions)

    ## get information on where the boxplots were placed by looking at the values on the x-axis                                                    
    axes = plt.gca()  
    checkXticks= axes.get_xticks()
    numOfValuesPlotted =len(checkXticks)            ## check how many boxplots were actually plotted by counting the labels printed on the x-axis
    lengthValuesShouldPlot = len(valuesShouldPlot)  ## (check how many boxplots should have been created if no data was missing)



    if (numOfValuesPlotted < valuesShouldPlot): ## if number of values actually plotted is less than the maximum possible it means some values are missing
                                                ## if that occurs then want to move the plots across accordingly to leave gaps where the missing values should go


        labels = [item.get_text() for item in axes.get_xticklabels()]

        i=0                 ## counter to increment through the entire list of x values that should exist if no data was missing.
        j=0                 ## counter to increment through the list of x labels that were originally plotted (some labels may be missing, want to check what's missing)

        positionOfBoxesList =[] ## create a list which will eventually contain the positions on the x-axis where boxplots should be drawn  

        while ( j < numOfValuesPlotted): ## look at each value in turn in the list of x-axis labels (on the graph plotted earlier)

            if (labels[j] == valuesShouldPlot[i]):  ## if the value on the x axis matches the value in the list of 'valuesShouldPlot' 
                positionOfBoxesList.append(i)       ## then record that position as a suitable position to put a boxplot
                j = j+1
                i = i+1


            else :                                  ## if they don't match (there must be a value missing) skip the value and look at the next one             

                print("\n******** missing value ************")
                print("Date:"),
                print(xDate),
                print(", Position:"),
                print(i),
                print(":"),
                print(valuesShouldPlot[i])
                i=i+1               


        plt.close()     ## close the original plot (the one that didn't leave gaps for missing data)
        group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both', positions=positionOfBoxesList) ## replot with boxes in correct positions

    ## format graph to make it look better        
    plt.ylabel('Average distance (m)', fontsize =8)
    plt.xlabel('GPS sample interval (s)', fontsize=8)
    plt.tick_params(axis='x', which='major', labelsize=8)
    plt.tick_params(axis='y', which='major', labelsize=8)
    plt.xticks(rotation=90)   
    plt.title(str(xDate) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9) ## put the title above the first subplot (ie. at the top of the page)
    plt.suptitle('')
    axes = plt.gca() 
    axes.set_ylim([0,30000])

    ## save and close 
    plt.savefig(graphFilename)  
    plt.close()