使用Matlab

时间:2015-09-16 15:04:50

标签: matlab scatter-plot boxplot scatter

我使用Matlab绘制了一个矢量y(1xN)的简单箱图。我使用了多个分组变量:x1,x2,x3

x1(1xN)表示长度(0.5,1,2或3)

x2(1xN)表示规格(26或30)

x3(1xN单元阵列)表示供应商的名称。

close all; clc;

N = 1000;


% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);

% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);

% each cable being measured have a gauge of  1awg or 2awg:
x2 = randi(2,N,1);

% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA' 

x3 = cell(N,1);

for ii = 1:N
   if mod(ii,3) == 0
       x3{ii} = 'SONY';
   else
       x3{ii} = 'YAMAHA';
   end
end

figure(1)
boxplot(y,{x1,x2,x3});

我想在此箱图上绘制一个散点图,以显示创建箱线图的y的相关值,但我找不到一个将值分组为boxplot函数的函数。

我发现的最接近的是以下function,但它只接受一个分组变量。

任何帮助?

2 个答案:

答案 0 :(得分:0)

箱线图的方框由IQR决定。方框和异常值之间的数据是从上下四分位数的1.5 * IQR范围内的所有数据。您可以手动过滤数据。

例如......

% data generation 
data=randn(100,3);

%% 
datas=sort(data);
datainbox=datas(ceil(end/4)+1:floor(end*3/4),:);

[n1 n2]=size(datainbox);

figure(1);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3],datainbox,'k.')

%% 
% All datapoints coincide now horizontally. Consider adding a little random
% horizontal play to make them not coincide:

figure(2);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')

%%
% If you want to add all data between boxes and outliers too, do something like:

dataoutbox=datas([1:ceil(end/4) floor(end*3/4)+1:end],:);
n3=size(dataoutbox,1);
% calculate quartiles
dataq=quantile(data,[.25 .5 .75]);
% calculate range between box and outliers = between 1.5*IQR from quartiles
dataiqr=iqr(data);
datar=[dataq(1,:)-dataiqr*1.5;dataq(3,:)+dataiqr*1.5];
dataoutbox(dataoutbox<ones(n3,1)*datar(1,:)|dataoutbox>ones(n3,1)*datar(2,:))=nan;

figure(3);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
plot(ones(n3,1)*[1 2 3]+.4*(rand(n3,n2)-.5),dataoutbox,'.','color',[1 1 1]*.5)

答案 1 :(得分:0)

找到了一个简单的解决方案:

我编辑了'boxplot'功能的签名,所以除了'h'之外它还会返回'groupIndexByPoint':

function [h,groupIndexByPoint] = boxplot(varargin)

groupIndexByPoint是'boxplot'使用的内部变量。

现在只需在原始代码中添加4行:

N = 1000;

% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);

% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);

% each cable being measured have a gauge of  1awg or 2awg:
x2 = randi(2,N,1);

% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA' 

x3 = cell(N,1);

for ii = 1:N
   if mod(ii,3) == 0
       x3{ii} = 'SONY';
   else
       x3{ii} = 'YAMAHA';
   end
end

figure(1);
hold on;
[h,groups] = boxplot(y,{x1,x2,x3});
scattering_factor = 0.3;
scaterring_vector = (rand(N,1)-0.5)*scattering_factor;
groups_scattered = groups + scaterring_vector;
plot(groups_scattered,y,'.g');