试图阅读文本文件......但没有获取所有内容

时间:2012-03-16 17:57:04

标签: string matlab text file-io textscan

我正在尝试使用以下格式重复读取文件(但是由于篇幅太长,我已经删除了第一次重复的数据):

1.00 'day' 2011-01-02
'Total Velocity Magnitude RC - Matrix' 'm/day'
    0.190189     0.279141     0.452853      0.61355     0.757833     0.884577 
    0.994502      1.08952      1.17203      1.24442      1.30872      1.36653 
     1.41897      1.46675      1.51035      1.55003      1.58595      1.61824

使用完整数据here

下载实际文件

这是我用来读取上述文件中数据的代码:

fid = fopen(file_name); % open the file

dotTXT_fileContents = textscan(fid,'%s','Delimiter','\n'); % read it as string ('%s') into one big array, row by row
dotTXT_fileContents = dotTXT_fileContents{1};
fclose(fid); %# don't forget to close the file again

%# find rows containing 'Total Velocity Magnitude RC - Matrix' 'm/day'
data_starts = strmatch('''Total Velocity Magnitude RC - Matrix'' ''m/day''',...
    dotTXT_fileContents); % data_starts contains the line numbers wherever 'Total Velocity Magnitude RC - Matrix' 'm/day' is found

ndata = length(data_starts); % total no. of data values will be equal to the corresponding no. of '**  K' read from the .txt file

%# loop through the file and read the numeric data
for w = 1:ndata-1
    %# read lines containing numbers
    tmp_str = dotTXT_fileContents(data_starts(w)+1:data_starts(w+1)-3); % stores the content from file dotTXT_fileContents of the rows following the row containing 'Total Velocity Magnitude RC - Matrix' 'm/day' in form of string
    %# convert strings to numbers
    tmp_str = tmp_str{:}; % store the content of the string which contains data in form of a character
    %# assign output
    data_matrix_grid_wise(w,:) = str2num(tmp_str); % convert the part of the character containing data into number
end

为了让您了解我的文本文件中的数据模式,这些是代码中的一些结果:

data_starts =

           2
        1672
        3342
        5012
        6682
        8352
       10022

ndata =

     7

因此,我的data_matrix_grid_wise应包含1672-2-2-1(for a new line)=1667行。但是,我得到了这个结果:

data_matrix_grid_wise =

  Columns 1 through 2

   0.190189000000000   0.279141000000000
   0.423029000000000   0.616590000000000
   0.406297000000000   0.604505000000000
   0.259073000000000   0.381895000000000
   0.231265000000000   0.338288000000000
   0.237899000000000   0.348274000000000

  Columns 3 through 4

   0.452853000000000   0.613550000000000
   0.981086000000000   1.289920000000000
   0.996090000000000   1.373680000000000
   0.625792000000000   0.859638000000000
   0.547906000000000   0.743446000000000
   0.562903000000000   0.759652000000000

  Columns 5 through 6

   0.757833000000000   0.884577000000000
   1.534560000000000   1.714330000000000
   1.733690000000000   2.074690000000000
   1.078000000000000   1.277930000000000
   0.921371000000000   1.080570000000000
   0.934820000000000   1.087410000000000

我哪里错了?在我的最终结果中,我应该data_matrix_grid_wise10000元素而不是36元素组成。感谢。

更新:如何在“天”之前加上数字,即1,2,3等在data_starts(w)之前的一行?我在循环中使用它但它似乎不起作用:

days_str = dotTXT_fileContents(data_starts(w)-1);
    days_str = days_str{1};
    days(w,:) = sscanf(days_str(w-1,:), '%d %*s %*s', [1, inf]);

2 个答案:

答案 0 :(得分:1)

问题在于最后两个陈述。执行tmp_str{:}时,将单元格数组转换为以逗号分隔的字符串列表。如果将此列表分配给单个变量,则仅分配第一个字符串。所以tmp_str现在只有第一行数据。

这是你可以做的而不是最后两行:

tmp_mat = cellfun(@str2num, tmp_str, 'uniformoutput',0);
data_matrix_grid_wise(w,:) = cell2mat(tmp_mat);

但是,您将遇到串联问题(cell2mat),因为并非所有行都具有相同的列数。这取决于你如何解决它。

答案 1 :(得分:1)

行中的问题tmp_str = tmp_str {:};处理字符时,Matlab有奇怪的行为。您的简短解决方案是替换为接下来的两行:

y = cell2mat( cellfun(@(z) sscanf(z,'%f'),tmp_str,'UniformOutput',false));
data_matrix_grid_wise(w,:) = y;