我在matplotlib(Python)中使用boxplot来创建箱形图,我创建了许多具有不同日期的图形。在x轴上,数据是离散的。
x轴上的值(以秒为单位)为0.25,0.5,1,2,5 ...... 28800.这些值是任意选择的(它们是采样周期)。在某些图表上,由于数据不可用,缺少一个或两个值。在这些图表上,x轴调整自身大小以展开其他值。
我希望所有图形在x轴上的相同位置具有相同的值(如果x轴显示值但图表上没有绘制数据则无关紧要)
有人能告诉我是否有办法指定x轴值?或者将相同的值保存在同一个地方的另一种方法。
相关的代码部分如下:
表示i,myDataframe.groupby(“日期”)中的分组:
graphFilename = (basename+'_' + str(i) + '.png')
plt.figure(graphFilename)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+') ## colour = 'blue'
plt.grid(True)
axes = plt.gca()
axes.set_ylim([0,30000])
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.title(str(i) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9)
plt.suptitle('')
plt.savefig(graphFilename)
plt.close()
任何帮助表示赞赏,我将继续谷歌搜索...谢谢你。
答案 0 :(得分:1)
如果你尝试这样的话:
plt.xticks(np.arange(x.min(), x.max(), 5))
其中x是x值的数组,5是沿轴采取的步骤。
同样适用于具有yticks的y轴。希望能帮助到你! :)
编辑:
我删除了我没有的实例,但是下面的代码应该为你提供一个网格来绘制:
import matplotlib.pyplot as plt
import numpy as np
plt.grid(True)
axes = plt.gca()
axes.set_ylim([0, 30000])
plt.ylabel('Average distance (m)', fontsize=8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle('')
my_xticks =[0.25,0.5,1,2,5,10,20,30,60,120,300,600,1200,1800,2400,3000,3600,7200,10800, 14400,18000,21600,25200,28800]
x = np.array(np.arange(0, len(my_xticks), 1))
plt.xticks(x, my_ticks)
plt.show()
尝试在此基础上插入您的值:)
答案 1 :(得分:1)
默认情况下,boxplot
只是将可用数据绘制到轴上的连续位置。遗漏了数据,仅仅因为箱线图并不知道它们丢失了。但是,可以使用positions
参数手动设置框的位置。
以下示例执行此操作,从而即使在缺少值时也会生成相等范围的图。
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
basename = __file__+"_plot"
Nd = 4 # four different dates
Ns = 5 # five second intervals
N = 80 # each 80 values
date = []
seconds = []
avgdist = []
# fill lists
for i in range(Nd):
# for each date, select a random SamplePeriod to be not part of the dataframe
w = np.random.randint(0,5)
for j in range(Ns):
if j!=w:
av = np.random.poisson(1.36+j/10., N)*4000+1000
avgdist.append(av)
seconds.append([j]*N)
date.append([i]*N)
date = np.array(date).flatten()
seconds = np.array(seconds).flatten()
avgdist = np.array(avgdist).flatten()
#put data into DataFrame
myDataframe = pd.DataFrame({"Date" : date, "SamplePeriod_seconds" : seconds, "avgdist" : avgdist})
# obtain a list of all possible Sampleperiods
globalunique = np.sort(myDataframe["SamplePeriod_seconds"].unique())
for i, group in myDataframe.groupby("Date"):
graphFilename = (basename+'_' + str(i) + '.png')
fig = plt.figure(graphFilename, figsize=(6,3))
axes = fig.add_subplot(111)
plt.grid(True)
# omit the `dates` column
dfgroup = group[["SamplePeriod_seconds", "avgdist"]]
# obtain a list of Sampleperiods for this date
unique = np.sort(dfgroup["SamplePeriod_seconds"].unique())
# plot the boxes to the axes, one for each sample periods in dfgroup
# set the boxes' positions to the values in unique
dfgroup.boxplot(by=["SamplePeriod_seconds"], sym='g+', positions=unique, ax=axes)
# set xticks to the unique positions, where boxes are
axes.set_xticks(unique)
# make sure all plots share the same extent.
axes.set_xlim([-0.5,globalunique[-1]+0.5])
axes.set_ylim([0,30000])
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle(str(i) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9)
plt.title("")
plt.savefig(graphFilename)
plt.close()
如果SamplePeriod_seconds
列中的值不等距,但仍然有效,但当然如果它们非常不同,这将不会产生很好的结果,因为条形将重叠:
然而,这不是绘图本身的问题。为了获得进一步的帮助,我们需要知道您最终期望图表的样子。
答案 2 :(得分:0)
非常感谢大家的帮助,使用您的答案,我使用以下代码。 (我意识到它可能会有所改进,但很高兴它有效我现在可以查看数据:))
valuesShouldPlot = ['0.25','0.5','1.0','2.0','5.0','10.0','20.0','30.0','60.0','120.0','300.0','600.0','1200.0','1800.0','2400.0','3000.0','3600.0','7200.0','10800.0','14400.0','18000.0','21600.0','25200.0','28800.0']
for xDate, group in myDataframe.groupby("Date"): ## for each date
graphFilename = (basename+'_' + str(xDate) + '.png') ## make up a suitable filename for the graph
plt.figure(graphFilename)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both') ## create box plot, (boxplots are placed in default positions)
## get information on where the boxplots were placed by looking at the values on the x-axis
axes = plt.gca()
checkXticks= axes.get_xticks()
numOfValuesPlotted =len(checkXticks) ## check how many boxplots were actually plotted by counting the labels printed on the x-axis
lengthValuesShouldPlot = len(valuesShouldPlot) ## (check how many boxplots should have been created if no data was missing)
if (numOfValuesPlotted < valuesShouldPlot): ## if number of values actually plotted is less than the maximum possible it means some values are missing
## if that occurs then want to move the plots across accordingly to leave gaps where the missing values should go
labels = [item.get_text() for item in axes.get_xticklabels()]
i=0 ## counter to increment through the entire list of x values that should exist if no data was missing.
j=0 ## counter to increment through the list of x labels that were originally plotted (some labels may be missing, want to check what's missing)
positionOfBoxesList =[] ## create a list which will eventually contain the positions on the x-axis where boxplots should be drawn
while ( j < numOfValuesPlotted): ## look at each value in turn in the list of x-axis labels (on the graph plotted earlier)
if (labels[j] == valuesShouldPlot[i]): ## if the value on the x axis matches the value in the list of 'valuesShouldPlot'
positionOfBoxesList.append(i) ## then record that position as a suitable position to put a boxplot
j = j+1
i = i+1
else : ## if they don't match (there must be a value missing) skip the value and look at the next one
print("\n******** missing value ************")
print("Date:"),
print(xDate),
print(", Position:"),
print(i),
print(":"),
print(valuesShouldPlot[i])
i=i+1
plt.close() ## close the original plot (the one that didn't leave gaps for missing data)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both', positions=positionOfBoxesList) ## replot with boxes in correct positions
## format graph to make it look better
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.title(str(xDate) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9) ## put the title above the first subplot (ie. at the top of the page)
plt.suptitle('')
axes = plt.gca()
axes.set_ylim([0,30000])
## save and close
plt.savefig(graphFilename)
plt.close()