我有一个大文本文件,如下所示:
PMID- 123456123
OWN - NLM
DA - 20160930
PMID- 27689094
OWN - NLM
VI - 2016
DP - 2016
PMID- 27688828
OWN - NLM
STAT- Publisher
DA - 20160930
LR - 20160930
依旧...... 我想根据每个空白行将文本文件拆分为较小的文本文件。同时命名与其PMID号对应的每个文本文件,如下所示:
filename' 123456123.txt'包含:
PMID- 123456123
OWN - NLM
DA - 20160930
filename' 27689094.txt'包含:
PMID- 27689094
OWN - NLM
VI - 2016
DP - 2016
filename' 27688828.txt'包含:
PMID- 27688828
OWN - NLM
STAT- Publisher
DA - 20160930
LR - 20160930
这是我的尝试,我知道如何识别空行(我认为),但我不知道如何拆分并保存为较小的文本文件:
fid = fopen(filename);
text = fgets(fid);
blankline = sprintf('\r\n');
while ischar(text)
if strcmp(blankline,str)
%split the text
else
%write the text to the smaller file
end
end
答案 0 :(得分:2)
您可以读取整个文件,然后使用regexp
将内容拆分为空行。然后,您可以再次使用regexp
提取每个组的PMID,然后遍历所有部分并保存它们。将文件处理为像这样的一个巨大的字符串可能比使用fgets
逐个读取它更有效。
% Tell it what folder you want to put the files in
outdir = '/my/folder';
% Read the initial file in all at once
fid = fopen(filename, 'r');
data = fread(fid, '*char').';
fclose(fid);
% Break it into pieces based upon empty lines
pieces = regexp(data, '\n\s*\n', 'split');
% For each piece get the PMID
pmids = regexp(pieces, '(?<=PMID-\s*)\d*', 'match', 'once');
% Now loop through and save each one
for k = 1:numel(pieces)
% Use the PMID of this piece to construct a filename
filename = fullfile(outdir, [pmids{k}, '.txt']);
% Now write the piece to the file
fid = fopen(filename, 'w');
fwrite(fid, pieces{k});
fclose(fid);
end