在异常范围的1.5-3倍之间的异常值标有“+”,高于IQR的3倍,标有“o”。但是由于this数据集具有多个异常值,因此下面的箱形图非常难以阅读,因为“+”和“o”符号彼此重叠绘制,从而创建看似粗红线的内容。
我需要绘制所有数据,因此删除它们不是一个选项,但我可以显示“更长”的框,即拉伸q1和q4以达到真正的最小/最大值并跳过“+”和“ o“异常符号。如果只显示最小和最大异常值,我也会没事的。
我在这里完全处于黑暗状态,发现的{octave boxplot文档here没有包含有关如何处理异常值的任何有用示例。在stackoverflow上搜索并没有让我更接近解决方案。所以任何帮助或方向非常感谢!
如何修改以下代码以基于可读的相同数据集创建箱线图(即不会在彼此之上绘制异常值,从而创建粗红线)? < / p>
我在Windows 10计算机上使用Octave 4.2.1 64位,其中qt作为graphics_toolkit,并且在Octave内调用GDAL_TRANSLATE来处理tif文件。
这不是将graphics_toolkit切换到gnuplot等的选项,因为我无法“旋转”绘图(水平框而不是垂直框)。并且在.pdf文件中,结果必须具有效果,而不仅仅是在八度音阶查看器中。
请原谅我的完全“新手式”编码工作,以获得正确的高分辨率pdf导出:
pkg load statistics
clear all;
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
## generate plot
boxplot (raw_v)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
答案 0 :(得分:1)
我会删除异常值。这很容易,因为返回了句柄。我还提供了一些缓存算法,因此如果你玩的是剧情,你不必重新加载所有的tif。在不同的脚本中拆分转换,处理和绘图总是一个好主意(但不适用于优先考虑简约示例的stackoverflow)。我们走了:
pkg load statistics
cache_fn = "input.raw";
# only process tif if not already done
if (! exist (cache_fn, "file"))
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
printf ("calling '%s'...\n", cmd);
fflush (stdout);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
# save result
save (cache_fn, "raw_v", "fns");
else
load (cache_fn)
endif
## generate plot
[s, h] = boxplot (raw_v);
## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
给出