应用此方法时:
%% When an outlier is considered to be more than three standard deviations away from the mean, use the following syntax to determine the number of outliers in each column of the count matrix:
mu = mean(data)
sigma = std(data)
[n,p] = size(data);
% Create a matrix of mean values by replicating the mu vector for n rows
MeanMat = repmat(mu,n,1);
% Create a matrix of standard deviation values by replicating the sigma vector for n rows
SigmaMat = repmat(sigma,n,1);
% Create a matrix of zeros and ones, where ones indicate the location of outliers
outliers = abs(data - MeanMat) > 3*SigmaMat;
% Calculate the number of outliers in each column
nout = sum(outliers)
% To remove an entire row of data containing the outlier
data(any(outliers,2),:) = []; %% this line
最后一行从我的数据集中删除了一定数量的观察(行)。然而,我在程序中遇到问题,因为我手动将观察(行)的数量表示为1000.
%% generate sample data
K = 6;
numObservarations = 1000;
dimensions = 3;
如果我将numObservarations
更改为data
,我会收到标量输出错误,但如果我不更改它,由于行数不匹配,我会收到此错误:
??? Error using ==> minus
Matrix dimensions must agree.
Error in ==> datamining at 106
D(:,k) = sum( ((data -
repmat(clusters(k,:),numObservarations,1)).^2), 2);
有没有办法设置numObservarations
,以便它自动检测data
中的行数并将其输出为一个数字?
答案 0 :(得分:5)
我一定是误会了。据我所知,这应该足够了:
numObservations = size(data, 1);