大文本文件中的列向量数组(数值) - 初始化,替换坏单元格

时间:2014-05-26 00:59:15

标签: arrays matlab

我有一个非常大的文本文件,主要包含带有标题(字符串)的数字数据。它有13列和49,000行。所有单元格都包含一个数值(没有空单元格,没有具有不同行数的列)。它是关于从卫星获取的太阳风的数据。它看起来像这样:

year,day,hr,min, sec,  Np,     Tp,          Vx_gsm,    Vy_gsm,  Vz_gsm,   Bx_gsm.....
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 .....

现在,这些列实际上在文本文件中完全对齐,但我不能在这里包含所有13列,但现在这并不重要。只要有" -9999.90039",这意味着该值由于某种原因而被破坏。我需要做的是更换" -9999.90039"使用NaN的值,因此它们不会包含在任何计算中。我需要为Np和Tp(现在)创建单元格数据列的数组,将它们初始化为零,并在每个中找到最小值和最大值。这是一个大文件,所以我想我需要在块中执行此操作。此外,因为我只需要访问1列数字单元格中的值(每次计算),所以我不需要改变很多并使用文本扫描。这是我到目前为止所得到的:

 N = 10;% block size will have to be bigger but I wanted to test first
solarmax = fopen('ACE_magswe_64sec_2000.txt','r');
formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f %f %f %f';
% my prof said I need to initialize my variables by setting them to zero so
% I put this line below but I don't think it's right
k = 0;% not sure if this is necessary
while ~feof(solarmax)
    k = k+1;% not sure if this is necessary
    C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
    function y = changeval(num)
    if (num('-9999.90039',num))
        y = 'NaN';
    end
end
fclose(solarmax);
Np = C{1,6};% not sure what to put here to call all values in that column
min(Np)
max(Np)
Tp = C{1,7};% same problem here
min(Tp)
max(Tp)

因此,我在FormatSpec中忽略了那些我想要忽略的列旁边的星号,并使用HeaderLines忽略前两行。之后,我对我应该如何设置这个问题感到困惑(我之前在编程方面的唯一体验是2006年的C ++!)请帮助!!

已更新

Marcin给了我一些很好的建议,但我仍然有一些我需要帮助的问题。这是我的代码,现在:

N = 1000;
solarmax = fopen('ACE_magswe_64sec_2000.txt','r+');
formatSpec = '%*d %*d %*d %*d %*f %f %f %f %f %f %f %f %f';

minNp = [];
maxNp = [];

minTp = [];
maxTp = [];

while ~feof(solarmax)
    C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
    Np = cell2mat(C(:,1)); 
    Tp = cell2mat(C(:,2));

    Np(Np == -9999.90039) = NaN;
    Tp(Tp == -9999.90039) = NaN;

    minNp(end+1) = nanmin(Np);
    maxNp(end+1) = nanmax(Np); 

    minTp(end+1) = nanmin(Tp);
    maxTp(end+1) = nanmax(Tp);
end
fclose(solarmax);
nanmin(Np);
nanmax(Np);
nanmin(Tp);
nanmax(Tp);

在最后编译和删除分号时,我得到min和max函数的值,结果是所有的值都是NaN!我认为min和max命令已经忽略了NaN,所以我查了一下并建议使用nanmin / nanmax。然而,这产生了相同的结果。还有什么我想念的吗?

1 个答案:

答案 0 :(得分:0)

也许这会有所帮助。

基于您的问题的示例数据。我做得更长,所以要超过10行。

year,day,hr,min, sec,  Np,     Tp,          Vx_gsm,    Vy_gsm,  Vz_gsm
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -17.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -267.01060 
2000   1  0  1 47.0496 3.42400 -99.90039  -9999.90039 49.72259 -167.01060

根据示例数据和您的描述修改代码:

N = 10;% block size will have to be bigger but I wanted to test first

solarmax = fopen('ACE_magswe_64sec_2000.txt','r');

formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f';

minNp = []; % to store min values for each iteration of the while loop
maxNp = []; % to store max values for each iteration of the while loop

while ~feof(solarmax)    

    % read N=10 rows from the file
    C = textscan(solarmax,formatSpec, N, 'HeaderLines',2,'Delimiter','\t');

    % get Np and Tp column as row vectors
    Np = cell2mat(C(:,2)); 
    Tp = cell2mat(C(:,3)); 

    % chage -9999.90039 to NaN
    Np(Np == -9999.90039) = NaN;
    Np(Np == -9999.90039) = NaN;

    % calculate min or max values for each set ot N=10 values as you
    % did. Probably need to store them, so do this:        
    minNp(end+1) = min(Np);
    maxNp(end+1) = max(Np);        

    % the same do for Tp.
end
fclose(solarmax);

每次迭代循环得到一个最小值和最大值:

minNp =

    3.2740    3.2740    3.2740


maxNp =

    3.4240    3.4240    3.4240