提取文件名和分机号。来自单元格字符串数组

时间:2017-07-05 08:15:20

标签: octave

我想从带有文件名的字符串单元格数组中删除目录部分。当然,一种方法是循环遍历单元格arrray并使用fileparts,但我拥有超过1e5的文件,速度非常重要。

我目前的做法是:

fns = {"/usr/local/foo.lib", "~/baz.m", "home/rms/eula.txt", "bar.m"}

filenames = cellfun (@(fn, s) fn(s+1:end), fns,
                     num2cell (rindex (fns, filesep())),
                     "UniformOutput", false)

给出了所需的输出:

fns = 
{
  [1,1] = /usr/local/foo.lib
  [1,2] = ~/baz.m
  [1,3] = home/rms/eula.txt
  [1,4] = bar.m
}
filenames = 
{
  [1,1] = foo.lib
  [1,2] = baz.m
  [1,3] = eula.txt
  [1,4] = bar.m
}

每个文件大约需要2e-5s。是否有更好(更快,更可读)的方法来做到这一点?

编辑我添加了Sardars解决方案以及之前使用正则表达式和一些基准测试结果的尝试:

fns = {"/usr/local/foo.lib", "~/baz.m", "home/rms/eula.txt", "bar.m"};
fns = repmat (fns, 1, 1e4);

tic
f1 = cellfun (@(fn, s) fn(s+1:end), fns,
              num2cell (rindex (fns, "/")),
              "UniformOutput", false);
toc

tic
[~, ~, ~, M] = regexp (fns, "[^\/]+$", "lineanchors");
f2 = cell2mat (M);
toc

tic
## Asnwer from Sardar Usama
f3 = regexprep(fns, '.*/', ''); 
toc

assert (f1, f2)
assert (f1, f3)

给出了

Elapsed time is 0.729995 seconds.  (Original code with cellfun)
Elapsed time is 0.67545 seconds.   (using regexp)
Elapsed time is 0.230487 seconds.  (using regexprep)

1 个答案:

答案 0 :(得分:2)

使用regexprep搜索字符串直到最后/,并用空字符串替换匹配项。

filenames = regexprep(fns, '.*/', '');