Question

我已经在Matlab上使用多个字符串的单元格实现了我的算法，但我似乎无法通过读取文件来实现。

在Matlab上，我为每一行创建了字符串单元格，让我们称它们为行。

所以我得到

     line= 'string1' 'string2' etc
     line= 'string 5' 'string7'...
     line=...

等等。我有超过100行阅读。

我要做的是将第一行的单词与自身进行比较。然后组合第一行和第二行，并将第二行中的单词与组合单元格进行比较。我累积了我读过的每个细胞，并与最后一次细胞读数进行比较。

这是我的代码

每行

= a，b，c，d，......

for(i=1:length(a))
for(j=1:length(a))
  AA=ismember(a,a)
  end

  combine=[a,b]
  [unC,i]=unique(combine, 'first')
  sorted=combine(sort(i))

  for(i=1:length(sorted))
for(j=1:length(b))
  AB=ismember(sorted,b)
 end
 end

 combine1=[a,b,c]

..... 当我读取我的文件时，我创建了一个while循环，它读取整个脚本直到结束，所以如果我的所有字符串单元格具有相同的名称，我如何实现我的算法呢？

    while~feof(fid)
    out=fgetl(fid)
    if isempty(out)||strncmp(out, '%', 1)||~ischar(out)
    continue
    end
    line=regexp(line, ' ', 'split')

Answer 1

假设您的数据文件名为data.txt，其内容为：

string1 string2 string3 string4
string2 string3 
string4 string5 string6

仅保留第一个唯一事件的一种非常简单的方法是：

% Parse everything in one go
fid = fopen('C:\Users\ok1011\Desktop\data.txt');
out = textscan(fid,'%s');
fclose(fid);

unique(out{1})
ans = 
    'string1'
    'string2'
    'string3'
    'string4'
    'string5'
    'string6'

如前所述，如果出现以下情况，此方法可能无效：

您的数据文件存在违规行为
你实际上需要比较指数

编辑：性能解决方案

% Parse in bulk and split (assuming you don't know maximum 
%number of strings in a line, otherwise you can use textscan alone)

fid = fopen('C:\Users\ok1011\Desktop\data.txt');
out = textscan(fid,'%s','Delimiter','\n');
out = regexp(out{1},' ','split');
fclose(fid);

% Preallocate unique comb
comb = unique([out{:}]); % you might need to remove empty strings from here

% preallocate idx
m   = size(out,1);
idx = false(m,size(comb,2));

% Loop for number of lines (rows)
for ii = 1:m
    idx(ii,:) = ismember(comb,out{ii});
end

请注意，生成的idx为：

idx =
     1     1     1     1     0     0
     0     1     1     0     0     0
     0     0     0     1     1     1

保持这种形式的优势在于，您可以节省相对于单元阵列的空间（每个单元格需要112字节的开销）。您还可以将其存储为稀疏数组，以提高存储成本。

另一件需要注意的事情是，即使逻辑阵列长于例如逻辑阵列。索引的双数组，只要超出的元素为假，你仍然可以使用它（并且通过构造上述问题，idx满足这个要求）。一个澄清的例子：

A = 1:3;
A([true false true false false])

使用m.file的单词搜索算法

1 个答案: