如何处理此八度框图中的异常值以提高可读性

时间:2018-02-12 16:32:29

标签: octave boxplot readability outliers

在异常范围的1.5-3倍之间的异常值标有“+”,高于IQR的3倍,标有“o”。但是由于this数据集具有多个异常值,因此下面的箱形图非常难以阅读,因为“+”和“o”符号彼此重叠绘制,从而创建看似粗红线的内容。

我需要绘制所有数据,因此删除它们不是一个选项,但我可以显示“更长”的框,即拉伸q1和q4以达到真正的最小/最大值并跳过“+”和“ o“异常符号。如果只显示最小和最大异常值,我也会没事的。

我在这里完全处于黑暗状态,发现的{octave boxplot文档here没有包含有关如何处理异常值的任何有用示例。在stackoverflow上搜索并没有让我更接近解决方案。所以任何帮助或方向非常感谢!

如何修改以下代码以基于可读的相同数据集创建箱线图(即不会在彼此之上绘制异常值,从而创建粗红线)? < / p>

enter image description here

我在Windows 10计算机上使用Octave 4.2.1 64位,其中qt作为graphics_toolkit,并且在Octave内调用GDAL_TRANSLATE来处理tif文件。

这不是将graphics_toolkit切换到gnuplot等的选项,因为我无法“旋转”绘图(水平框而不是垂直框)。并且在.pdf文件中,结果必须具有效果,而不仅仅是在八度音阶查看器中。

请原谅我的完全“新手式”编码工作,以获得正确的高分辨率pdf导出:

pkg load statistics

clear all;
fns = glob ("*.tif");
for k=1:numel (fns)

  ofn = tmpnam;
  cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
  [s, out] = system (cmd);
  if (s != 0)
    error ('calling gdal_translate failed with "%s"', out);
  endif
  fid = fopen (ofn, "r");
  # read 6 headerlines
  hdr = [];
  for i=1:6
    s = strsplit (fgetl (fid), " ");
    hdr.(s{1}) = str2double (s{2});
  endfor
  d = dlmread (fid);

  # check size against header
  assert (size (d), [hdr.nrows hdr.ncols])

  # set nodata to NA
  d (d == hdr.NODATA_value) = NA;

  raw{k} = d;

  # create copy with existing values
  raw_v{k} = d(! isna (d));

  fclose (fid);

endfor

## generate plot
boxplot (raw_v)


set (gca, "xtick", 1:numel(fns),
          "xticklabel", strrep (fns, ".tif", ""));
          ylabel ("Plats kvar (meter)");

set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");

set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")          


zoom (0.95)
view ([90 90])

print ("loudden_box_dotted.pdf", "-F:14")

1 个答案:

答案 0 :(得分:1)

我会删除异常值。这很容易,因为返回了句柄。我还提供了一些缓存算法,因此如果你玩的是剧情,你不必重新加载所有的tif。在不同的脚本中拆分转换,处理和绘图总是一个好主意(但不适用于优先考虑简约示例的stackoverflow)。我们走了:

pkg load statistics

cache_fn = "input.raw";

# only process tif if not already done
if (! exist (cache_fn, "file"))
  fns = glob ("*.tif");
  for k=1:numel (fns)

    ofn = tmpnam;
    cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
    printf ("calling '%s'...\n", cmd);
    fflush (stdout);
    [s, out] = system (cmd);
    if (s != 0)
      error ('calling gdal_translate failed with "%s"', out);
    endif
    fid = fopen (ofn, "r");
    # read 6 headerlines
    hdr = [];
    for i=1:6
      s = strsplit (fgetl (fid), " ");
      hdr.(s{1}) = str2double (s{2});
    endfor
    d = dlmread (fid);

    # check size against header
    assert (size (d), [hdr.nrows hdr.ncols])

    # set nodata to NA
    d (d == hdr.NODATA_value) = NA;

    raw{k} = d;

    # create copy with existing values
    raw_v{k} = d(! isna (d));

    fclose (fid);

  endfor

  # save result
  save (cache_fn, "raw_v", "fns");
else
  load (cache_fn)
endif

## generate plot
[s, h] = boxplot (raw_v);

## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)

set (gca, "xtick", 1:numel(fns),
          "xticklabel", strrep (fns, ".tif", ""));
          ylabel ("Plats kvar (meter)");

set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");

set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")          

zoom (0.95)
view ([90 90])

print ("loudden_box_dotted.pdf", "-F:14")

给出

generated plot