我有.txt格式的非常大的数据文件(通常为30Gb到60Gb)。我想找到一种方法来自动抽取文件而不先将它们导入内存。 我的.txt文件由两列数据组成,这是一个示例文件: https://www.dropbox.com/s/87s7qug8aaipj31/RTL5_57.txt
到目前为止,我所做的是将数据导入变量“C”,然后对数据进行下采样。这种方法的问题在于变量“C”经常在程序变为抽取之前填充MATLAB的存储容量:
function [] = textscan_EPS(N,D,fileEPS )
%fileEPS: .txt address
%N: number of lines to read
%D: Decimation factor
fid = fopen(fileEPS);
format = '%f\t%f';
C = textscan(fid, format, N, 'CollectOutput', true);% this variable exceeds memory capacity
d = downsample(C{1},D);
plot(d);
fclose(fid);
end
如何修改此行:
C = textscan(fid, format, N, 'CollectOutput', true);
通过将.txt文件的每隔一行或每隔3行ect ..从磁盘导入到内存中的变量“C”,它可以有效地抽取此实例中的数据。
非常感谢任何帮助。
干杯, 吉姆
PS 我一直在玩的另一种方法是使用“fread”,但它也遇到了同样的问题:
function [d] = fread_EPS(N,D,fileEPS)
%N: number of lines to read
%D: decimation factor
%fileEPS: location of .txt fiel
%read in the data as characters
fid = fopen(fileEPS);
c = fread(fid,N*19,'*char');% EWach line of .txt has 19 characters
%Parse and read the data into floading point numbers
f=sscanf(c,'%f');
%Reshape the data into a two column format
format long
d=decimate((flipud(rot90(reshape(f,2,[])))),D); %reshape for 2 colum format, rotate 90, flip veritically,decimation factor
答案 0 :(得分:2)
我相信文本扫描是可行的方法,但您可能需要采取中间步骤。假设您可以一次轻松阅读N
行,我会这样做:
textscan(fileID,formatSpec,N)
应该可以每次只读1行,并决定是否要保留/丢弃它。虽然这应该消耗最少的内存,但我每次都会尝试做几千次以获得合理的性能。
答案 1 :(得分:0)
我最后根据Dennis Jaheruddin的建议编写了以下代码。它似乎适用于大型.txt文件(10GB到50Gb)。该代码的灵感来自另一篇文章: Memory map file in MATLAB?
Nlines = 1e3; % set numbe of lines to sample per cycle
sample_rate = (1/1.5e6); %data sample rate
DECE= 1;% decimation factor
start = 40; %start of plot time
finish = 50; % end plot time
TIME = (0:sample_rate:sample_rate*((Nlines)-1));
format = '%f\t%f';
fid = fopen('C:\Users\James Archer\Desktop/RTL5_57.txt');
while(~feof(fid))
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point you need the memory!
clearvars C ;
TIME = ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift Time along
if ((TIME(end)) > start) && ((TIME(end)) < finish);
plot((TIME(1:DECE:end)),(d(1:DECE:end,:)))%plot and decimate
end
hold on;
clearvars d;
end
fclose(fid);
旧版本的MATLAB不能很好地处理此代码,将显示以下消息:
Caught std::exception Exception message is: bad allocation
但MATLAB 2013工作得很好