使用Octave 4.0.3上的textscan()问题(300万行/ 250 MB文件)

时间:2016-10-14 12:52:46

标签: matlab octave

我试图重写一段MATLAB代码,以便它可以使用Octave运行,但我发现使用textscan()函数时遇到了一些麻烦。

原始代码(MATLAB):

function data = import_file(filename, startRow, endRow)

delimiter = ' ';
if nargin<=2
    startRow = 3;
    endRow = inf;
end

formatSpec = '%f%f%f%f%f%f%*s%[^\n\r]';

fileID = fopen(filename,'r');

dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'EmptyValue' ,NaN,'HeaderLines', startRow(1)-1, 'ReturnOnError', false);
for block=2:length(startRow)
    frewind(fileID);
    dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'EmptyValue' ,NaN,'HeaderLines', startRow(block)-1, 'ReturnOnError', false);
    for col=1:length(dataArray)
        dataArray{col} = [dataArray{col};dataArrayBlock{col}];
    end
end

fclose(fileID);

data = [dataArray{1:end-1}];

end

错误:

error: strread: %q, %c, %[] or bit width format specifier
s are not supported yet.
error: called from
     strread at line 329 column 7
     textscan at line 321 column 8
     import_file at line 13 column 15
     main at line 52 column 22

示例数据:

# U  POINT_DATA 3711396
#  x  y  z  U_x  U_y  U_z  
739263.5 9363820 172.809998 -5.34212399 -0.0408997531 0.0736143066
739263.5 9363789 172.979996 -5.34212399 -0.0408997531 0.0736143066
739294.312 9363820 172.449997 -5.34212399 -0.0408997531 0.0736143066
739294.312 9363789 173.710007 -5.34212399 -0.0408997531 0.0736143066
739325.125 9363820 170.699997 -5.248474 -0.00403332808 0.041700209
739325.125 9363789 172.350006 -5.37227834 -0.0307070923 0.0492642202
739355.938 9363820 168.690002 -5.248474 -0.00403332808 0.041700209
739355.938 9363789 170.5 -5.37227834 -0.0307070923 0.0492642202
739386.75 9363820 169.110001 -5.248474 -0.00403332808 0.041700209
739386.75 9363789 170.839996 -5.37227834 -0.0307070923 0.0492642202
739417.562 9363820 170.789993 -5.248474 -0.00403332808 0.041700209
739417.562 9363789 171.820007 -5.37227834 -0.0307070923 0.0492642202

我已经尝试过使用其他函数,比如dlmread(),load()甚至fgetl()来完成这项工作,但与过去用于MATLAB的8s相比,它需要花费太多时间。

将格式规格替换为&#39;%f%f%f%f%f%f&#39;也没有工作。

该文件包含3711396行和250 MB数据,分为六列数据。

你能帮我调整代码吗?

1 个答案:

答案 0 :(得分:1)

我能够查看您的代码并发现有两件事阻止它运行。

第一个是在您的formatSpec

中使用%[^\n\r]

第二个是使用'ReturnOnError'名称/值对。

Octave尚不支持这两项功能。

我能够使用以下修改后的代码成功导入您提供的示例数据:

function data = import_file(filename, startRow, endRow)

if nargin<=2
    startRow = 3;
    endRow = inf;
end

formatSpec = '%f%f%f%f%f%f';
% Corrected formatSpec to import 6 consecutive floats 

fileID = fopen(filename,'r');

dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1,...
    'EmptyValue' ,NaN,...
    'HeaderLines', startRow(1)-1); 
% Removed 'ReturnOnError' as it is not yet implimented in Octave.

for block=2:length(startRow)
    frewind(fileID);

    dataArrayBlock = textscan(fileID, formatSpec,...
        endRow(block)-startRow(block)+1,...
        'EmptyValue' ,NaN,...
        'HeaderLines', startRow(block)-1);

    for col=1:length(dataArray)
        dataArray{col} = [dataArray{col};dataArrayBlock{col}];
    end
end

fclose(fileID);

data = [dataArray{1:end}]; 
%Changed 'end-1' to 'end' to include last column.

end