使用正则表达式读取文本文件并存储到结构中

时间:2012-09-05 13:54:13

标签: regex matlab struct io

我是每个人,

我正在尝试将文本文件解析为matlab:它由几个块(START_BLOCK / END_BLOCK)组成,其中分配了字符串(变量)和值(与前面的变量相关联)。

一个例子是:

START_BLOCK_EXTREMEWIND
velocity_v1 29.7


velocity_v50    44.8


velocity_vred1  32.67
velocity_vred50 49.28


velocity_ve1    37.9



velocity_ve50   57


velocity_vref   50



END_BLOCK_EXTREMEWIND

目前,我的代码是:

fid = fopen('test_struct.txt','rt');
C = textscan(fid,'%s %f32 %*[^\n]','CollectOutput',true);
C{1} = reshape(C{1},1,numel(C{1}));
C{2} = reshape(C{2},1,numel(C{2}));



startIdx = find(~cellfun(@isempty, regexp(C{1}, 'START_BLOCK_', 'match')));
endIdx = find(~cellfun(@isempty, regexp(C{1}, 'END_BLOCK_', 'match')));
assert(all(size(startIdx) == size(endIdx)))
extract_parameters = @(n)({C{1}{startIdx(n)+1:endIdx(n) - 1}});
parameters = arrayfun(extract_parameters, 1:numel(startIdx), 'UniformOutput', false);

s = cell2struct(cell(size(parameters{1})),parameters{1}(1:numel(parameters{1})),2);

s.velocity_v1 = C{2}(2);
s.velocity_v50 = C{2}(3);
s.velocity_vred1 = C{2}(4);
s.velocity_vred50 = C{2}(5);
s.velocity_ve1 = C{2}(6);
s.velocity_ve50 = C{2}(7);
s.velocity_vref = C{2}(8);

它有效,但它绝对是静态的。我宁愿有一个代码能够:

1. check the existence of blocks --> as already implemented;
2. the strings are to be taken as fields of the structure;
3. the numbers are meant to be the attributes of each field.

最后,如果有多个块,则应该对这些块进行迭代以获得整个结构。 这是我第一次接触结构编码,所以请耐心等待。

我提前感谢你们。

最诚挚的问候。

2 个答案:

答案 0 :(得分:1)

听起来你会想要使用动态字段名称。如果您有一个结构s,一个存储字段名称的字符串fieldName,以及包含您要为此字段设置的值的fieldVal,那么您可以使用以下语法执行赋值:

s.(fieldName) = fieldVal;

此MATLAB doc提供了更多信息。

考虑到这一点,我采用了稍微不同的方法来解析文本。我用for循环遍历文本。虽然for循环有时在MATLAB中不受欢迎(因为MATLAB针对向量化操作进行了优化),但我认为在这种情况下它有助于使代码更清晰。此外,我的理解是,如果你不得不使用arrayfun,那么用for循环替换它可能不会真正导致性能损失,无论如何。

以下代码将文本中的每个块转换为具有指定字段和值的结构。然后将这些生成的“块”结构添加到更高级别的“结果”结构中。

fid = fopen('test_struct.txt','rt');
C = textscan(fid,'%s %f32 %*[^\n]','CollectOutput',true);
fclose(fid);

paramNames = C{1};
paramVals = C{2};

curBlockName = [];
inBlock = 0;
blockCount = 0;

%// Iterate through all of the entries in "paramNames".  Each block will be a
%// new struct that is then added to a high-level "result" struct.
for i=1:length(paramNames)
    curParamName = paramNames{i};
    isStart = ~isempty(regexp(curParamName, 'START_BLOCK_', 'match'));
    isEnd = ~isempty(regexp(curParamName, 'END_BLOCK_', 'match'));

    %// If at the start of a new block, create a new struct with a single
    %// field - the BlockName (as specified by the text after "START_BLOCK_"
    if(isStart)
        assert(inBlock == 0);
        curBlockName = curParamName(length('START_BLOCK_') + 1:end);
        inBlock = 1;
        blockCount = blockCount + 1;
        s = struct('BlockName', curBlockName);          

    %// If at the end of a block, add the struct that we've just populated to
    %// our high-level "result" struct.
    elseif(isEnd)
        assert(inBlock == 1);
        inBlock = 0;
        %// EDIT - storing result in "structure of structures"
        %//  rather than array of structs
        %// s_array(blockCount) = s;
        result.(curBlockName) = s;

    %// Otherwise, assume that we are inside of a block, so add the current
    %// parameter to the struct.
    else
        assert(inBlock == 1);
        s.(curParamName) = paramVals(i);
    end
end

%// Results stored in "result" structure

希望这能回答你的问题...或者至少提供一些有用的提示。

答案 1 :(得分:0)

我今天编辑了我的代码,现在几乎可以正常工作:

clc, clear all, close all

%Find all row headers
fid = fopen('test_struct.txt','r');
row_headers = textscan(fid,'%s %*[^\n]','CommentStyle','%','CollectOutput',1);
row_headers = row_headers{1};
fclose(fid);

%Find all attributes
fid1 = fopen('test_struct.txt','r');
attributes = textscan(fid1,'%*s %s','CommentStyle','%','CollectOutput',1);
attributes = attributes{1};
fclose(fid1);

%Collect row headers and attributes in a single cell
parameters = [row_headers,attributes];


%Find all the blocks
startIdx = find(~cellfun(@isempty, regexp(parameters, 'BLOCK_START_', 'match')));
endIdx = find(~cellfun(@isempty, regexp(parameters, 'BLOCK_END_', 'match')));
assert(all(size(startIdx) == size(endIdx)))


%Extract fields between BLOCK_START_ and BLOCK_END_
extract_fields = @(n)(parameters(startIdx(n)+1:endIdx(n)-1,1));
struct_fields = arrayfun(extract_fields, 1:numel(startIdx), 'UniformOutput', false);

%Extract attributes between BLOCK_START_ and BLOCK_END_
extract_attributes = @(n)(parameters(startIdx(n)+1:endIdx(n)-1,2));
struct_attributes = arrayfun(extract_attributes, 1:numel(startIdx), 'UniformOutput', false);


for i = 1:numel(struct_attributes)
    s{i} = cell2struct(struct_attributes{i},struct_fields{i},1);
end

现在,最后,我得到了一个结构单元,可以说,它可以满足我的要求。我想改进的唯一一点是:

- Give each structure the name of the respective block.

有没有人提供有价值的提示?

谢谢大家支持我。

此致 弗朗西斯