擦除单元格数组中重复的行

时间:2014-05-30 16:17:32

标签: string matlab cell repeat

我有一个包含很多行的单元格数组,有时会重复这些行。从这个意义上讲,我想删除重复的行,只保留第一行。重要的是要知道我主要处理字符串值,这意味着常规和有用的函数不起作用。有人可以帮帮我吗?这是一个例子:

19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'
19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'

我想得到什么:

19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'

2 个答案:

答案 0 :(得分:3)

您可以先对单元格数组(sortrows)的行进行排序,然后识别具有线性复杂度的重复行(isequal应用于连续行。)

cellArray表示您的输入单元格数组:

cellArray = {19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
             19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
             19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'
             19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
             19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'}

代码:

[sorted, jj] = sortrows(cellArray);
ind = arrayfun(@(n) isequal(sorted(n,:),sorted(n+1,:)), 1:size(cellArray,1)-1);
result = cellArray(sort(jj([true ~ind])),:);

结果:

result = 
    [19970101]    [18659]    [183]    '19980820'    '00018659'    'RUNYON L'     '00001534'    'MERRILL'
    [19970101]    [18290]    [183]    '19981221'    '00018290'    'MANTON S'     '00001534'    'MERRILL'
    [19970101]    [10280]    [183]    '19980819'    '00010280'    'BRENNAN S'    '00001534'    'MERRILL'

答案 1 :(得分:2)

试试这个 -

%// Input cell array
input_cell_array ={
    19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
    19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
    19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'
    19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
    19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'}

%// "Standardize" the cells by converting all into strings
allstrc = cellfun(@num2str,input_cell_array,'uni',0)

%// Group each column as one cell for labelling them
allstrcg = mat2cell(allstrc,size(allstrc,1),ones(1,size(allstrc,2)))

%// Label them with unique command
[~,~,row_ind] = cellfun(@(x) unique(x,'stable'),allstrcg,'uni',0)

%// Sometimes the row_ind obtained from the earlier code are obtained in cells
%// as row or column vectors, so we need to normalize them -
row_ind = cellfun(@(x) reshape(x,[],1),row_ind,'uni',0) 

%// Get a double array of the labels 
mat1 = horzcat(row_ind{:})

%// Get unique rows of the labels
[~,ind] = unique(mat1,'rows','stable')

%// Finally get the desired output by selecting the unique rows from the labels
out = input_cell_array(ind,:)

输出 -

[19970101]    [18659]    [183]    '19980820'    '00018659'    'RUNYON L'     '00001534'    'MERRILL'
[19970101]    [18290]    [183]    '19981221'    '00018290'    'MANTON S'     '00001534'    'MERRILL'
[19970101]    [10280]    [183]    '19980819'    '00010280'    'BRENNAN S'    '00001534'    'MERRILL'