Matlab:如何导入带有分组信息的文本文件?

时间:2016-03-23 10:09:04

标签: matlab file-io import textscan

我想使用textscantextread将文本文件导入MATLAB。

我以前导入的文本文件一直是按行编写的,所以这很有效。现在,我需要导入的文件以这种方式编写:

ID: 100  Part: 1    
    V1       V2      V3       V4           V5           V6          V7          V8
X1  33   0.1842831   87   0.9759678   10.07302940   -2.126099   0.9205776   0.9933037    
X2  31   0.1695875   87   0.9961777   18.10119586   -5.153099   0.9651591   0.9999865

ID: 101  Part: 1
    V1       V2      V3       V4           V5           V6          V7          V8
X1  45   0.1942831   87   0.9759678   10.07302940   -2.126099   0.9205776   0.9933037
X2  52   0.1666875   87   0.9961777   18.10119586   -5.153099   0.9651591   0.9999865

....

这包括大约200个ID和部分:1 - 3。

我想要创建的是10个向量,后来形成2D矩阵。应重复复制每个标题行(ID,Part)中的信息。

ID = [100; 100; 101; 101; ...]
Part = [1; 1; 1; 1; ...]
V1 = [33; 31; 45; 52;...]
V2 = [0.1842831; 0.1695875; 0.1942831; 0.1666875; ...]
V3 = [....]
...

从Q& A here我无法确定如何在每个值子集的标题行中提取信息。

如何实现这一目标?

2 个答案:

答案 0 :(得分:0)

我们可以逐行解析文件:

% store data here
ID = [];
PART = [];
V = zeros(0,8);

% loop over lines, each iteration parses one section
fid = fopen('file.dat', 'rt');
while ~feof(fid)
    % first line is a header with ID/PART
    tline = fgetl(fid);
    C = textscan(tline, '%*s %d %*s %d');
    ID = [ID; C{1}];
    PART = [PART; C{2}];

    % second line is another header, ignored
    tline = fgetl(fid);

    % next two lines contain samples
    for i=1:2
        tline = fgetl(fid);
        C = textscan(tline, ['%*s' repmat(' %f',1,8)]);
        V = [V; cell2mat(C)];
    end
end
fclose(fid);

% convert to cell array, each cell represents a column: V1, V2, ..., V8
V = num2cell(V,1);

使用此示例文件file.dat

ID: 100 Part: 1
   V1       V2       V3       V4       V5       V6       V7      V8
X1 33 0.1842831 87 0.9759678 10.07302940 -2.126099 0.9205776 0.9933037
X2 31 0.1695875 87 0.9961777 18.10119586 -5.153099 0.9651591 0.9999865
ID: 101 Part: 1
   V1       V2       V3       V4       V5       V6       V7      V8
X1 45 0.1942831 87 0.9759678 10.07302940 -2.126099 0.9205776 0.9933037
X2 52 0.1666875 87 0.9961777 18.10119586 -5.153099 0.9651591 0.9999865

我得到以下结果:

>> ID
ID =
         100
         101
>> PART
PART =
           1
           1
>> V
V = 
  Columns 1 through 6
    [4x1 double]    [4x1 double]    [4x1 double]    [4x1 double]    [4x1 double]    [4x1 double]
  Columns 7 through 8
    [4x1 double]    [4x1 double]
>> celldisp(V)
V{1} =
    33
    31
    45
    52
..snip..
V{8} =
    0.9933
    1.0000
    0.9933
    1.0000

答案 1 :(得分:0)

如果您的文件很大,我会考虑为单元阵列分配内存。如果您不知道行数,请分配一个非常大的数组,然后在读取后删除所有空单元格。我没有在此代码中包含数组的分配,但它很容易解决。如果您需要什么,请告诉我。

fid = fopen('test.txt', 'r');
ID = {};
Part = {};
V = {};
line = fgetl(fid);
while ~isa(line,'double')
    if ~isempty(line)
        s = textscan(line,'%s');
        s = s{1};
        n = length(s);
        if n ==4
                ID(length(ID)+1) = s(2);
                Part(length(Part)+1) = s(4);
        elseif n == 9
                V(size(V,1)+1,1:8) = s(2:end);
        end
    end
    line = fgetl(fid);
end
fclose(fid);