我使用Matlab绘制了一个矢量y(1xN)的简单箱图。我使用了多个分组变量:x1,x2,x3
x1(1xN)表示长度(0.5,1,2或3)
x2(1xN)表示规格(26或30)
x3(1xN单元阵列)表示供应商的名称。
close all; clc;
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1)
boxplot(y,{x1,x2,x3});
我想在此箱图上绘制一个散点图,以显示创建箱线图的y的相关值,但我找不到一个将值分组为boxplot函数的函数。
我发现的最接近的是以下function,但它只接受一个分组变量。
任何帮助?
答案 0 :(得分:0)
箱线图的方框由IQR决定。方框和异常值之间的数据是从上下四分位数的1.5 * IQR范围内的所有数据。您可以手动过滤数据。
例如......
% data generation
data=randn(100,3);
%%
datas=sort(data);
datainbox=datas(ceil(end/4)+1:floor(end*3/4),:);
[n1 n2]=size(datainbox);
figure(1);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3],datainbox,'k.')
%%
% All datapoints coincide now horizontally. Consider adding a little random
% horizontal play to make them not coincide:
figure(2);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
%%
% If you want to add all data between boxes and outliers too, do something like:
dataoutbox=datas([1:ceil(end/4) floor(end*3/4)+1:end],:);
n3=size(dataoutbox,1);
% calculate quartiles
dataq=quantile(data,[.25 .5 .75]);
% calculate range between box and outliers = between 1.5*IQR from quartiles
dataiqr=iqr(data);
datar=[dataq(1,:)-dataiqr*1.5;dataq(3,:)+dataiqr*1.5];
dataoutbox(dataoutbox<ones(n3,1)*datar(1,:)|dataoutbox>ones(n3,1)*datar(2,:))=nan;
figure(3);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
plot(ones(n3,1)*[1 2 3]+.4*(rand(n3,n2)-.5),dataoutbox,'.','color',[1 1 1]*.5)
答案 1 :(得分:0)
找到了一个简单的解决方案:
我编辑了'boxplot'功能的签名,所以除了'h'之外它还会返回'groupIndexByPoint':
function [h,groupIndexByPoint] = boxplot(varargin)
groupIndexByPoint是'boxplot'使用的内部变量。
现在只需在原始代码中添加4行:
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1);
hold on;
[h,groups] = boxplot(y,{x1,x2,x3});
scattering_factor = 0.3;
scaterring_vector = (rand(N,1)-0.5)*scattering_factor;
groups_scattered = groups + scaterring_vector;
plot(groups_scattered,y,'.g');