Question

我正在使用matlab从文本文件中提取单词。我有几个文本文件，我想要textcan＆＃39; AB ＆＃39;每个文件的一部分。

据我所知，我知道如何从文本文件中读取特定行，但是，因为我想对文件夹中的所有文本文件应用相同的代码，行号每次都会有所不同，我将不得不每次都改变它。

这就是我所有文本文件的样子（样本）：

PMID- 27401974
  自己 - NLM
  STAT-发布者
  DP - 2016年7月8日
  TI - 南半球北寻求趋磁性Gammaproteobacteria   LID - AEM.01545-16 [pii]
  AB - 趋磁细菌（MTB）包含系统发育不同的组         能够沿磁场线定向和导航的原核生物。下         有氧条件，MTB一般在北半球的自然环境中         显示寻北（NS）极性，平行于地球的磁场游动         野外线，而南半球的那些通常游泳反平行         到磁场线（南寻（SS）极性）   CI - 美国微生物学会版权所有（c）2016。保留所有权利。
  FAU - Leao，Pedro
  AU - Leao P

提前谢谢！

Answer 1

我认为regexp是你的朋友：

fid = fopen('/path/to/file.txt');
line = fgetl(fid);
target = '';
found_ab = false;
while ischar(line)
    line = strtrim(line); % remove trailing white space
    if ~found_ab        
        res = regexp(line, '^AB\s*-?\s*(\S.*)$', 'tokens', 'once');
        if ~isempty(res)
            target = res{1};
            found_ab = true;
        end
    else
        % we found an "AB -" line, we see if there are multiple lines here
        res = regexp(line, '^[A-Z]+\s-\s'); 
        if ~ismepty(res)
            % we reached the end of AB - lines
            break;
        end
        % there are multiple text lines for "AB - "
        target = [target, line];
    end
    line = fgetl(fid);
end
fclose(fid);

从文本文件matlab

1 个答案: