Question

我有一个非常大的文本文件，主要包含带有标题（字符串）的数字数据。它有13列和49,000行。所有单元格都包含一个数值（没有空单元格，没有具有不同行数的列）。它是关于从卫星获取的太阳风的数据。它看起来像这样：

year,day,hr,min, sec,  Np,     Tp,          Vx_gsm,    Vy_gsm,  Vz_gsm,   Bx_gsm.....
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 .....

现在，这些列实际上在文本文件中完全对齐，但我不能在这里包含所有13列，但现在这并不重要。只要有＆＃34; -9999.90039＆＃34;，这意味着该值由于某种原因而被破坏。我需要做的是更换＆＃34; -9999.90039＆＃34;使用NaN的值，因此它们不会包含在任何计算中。我需要为Np和Tp（现在）创建单元格数据列的数组，将它们初始化为零，并在每个中找到最小值和最大值。这是一个大文件，所以我想我需要在块中执行此操作。此外，因为我只需要访问1列数字单元格中的值（每次计算），所以我不需要改变很多并使用文本扫描。这是我到目前为止所得到的：

 N = 10;% block size will have to be bigger but I wanted to test first
solarmax = fopen('ACE_magswe_64sec_2000.txt','r');
formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f %f %f %f';
% my prof said I need to initialize my variables by setting them to zero so
% I put this line below but I don't think it's right
k = 0;% not sure if this is necessary
while ~feof(solarmax)
    k = k+1;% not sure if this is necessary
    C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
    function y = changeval(num)
    if (num('-9999.90039',num))
        y = 'NaN';
    end
end
fclose(solarmax);
Np = C{1,6};% not sure what to put here to call all values in that column
min(Np)
max(Np)
Tp = C{1,7};% same problem here
min(Tp)
max(Tp)

因此，我在FormatSpec中忽略了那些我想要忽略的列旁边的星号，并使用HeaderLines忽略前两行。之后，我对我应该如何设置这个问题感到困惑（我之前在编程方面的唯一体验是2006年的C ++！）请帮助!!

已更新

Marcin给了我一些很好的建议，但我仍然有一些我需要帮助的问题。这是我的代码，现在：

N = 1000;
solarmax = fopen('ACE_magswe_64sec_2000.txt','r+');
formatSpec = '%*d %*d %*d %*d %*f %f %f %f %f %f %f %f %f';

minNp = [];
maxNp = [];

minTp = [];
maxTp = [];

while ~feof(solarmax)
    C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
    Np = cell2mat(C(:,1)); 
    Tp = cell2mat(C(:,2));

    Np(Np == -9999.90039) = NaN;
    Tp(Tp == -9999.90039) = NaN;

    minNp(end+1) = nanmin(Np);
    maxNp(end+1) = nanmax(Np); 

    minTp(end+1) = nanmin(Tp);
    maxTp(end+1) = nanmax(Tp);
end
fclose(solarmax);
nanmin(Np);
nanmax(Np);
nanmin(Tp);
nanmax(Tp);

在最后编译和删除分号时，我得到min和max函数的值，结果是所有的值都是NaN！我认为min和max命令已经忽略了NaN，所以我查了一下并建议使用nanmin / nanmax。然而，这产生了相同的结果。还有什么我想念的吗？

Answer 1

也许这会有所帮助。

基于您的问题的示例数据。我做得更长，所以要超过10行。

year,day,hr,min, sec,  Np,     Tp,          Vx_gsm,    Vy_gsm,  Vz_gsm
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -17.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -267.01060 
2000   1  0  1 47.0496 3.42400 -99.90039  -9999.90039 49.72259 -167.01060

根据示例数据和您的描述修改代码：

N = 10;% block size will have to be bigger but I wanted to test first

solarmax = fopen('ACE_magswe_64sec_2000.txt','r');

formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f';

minNp = []; % to store min values for each iteration of the while loop
maxNp = []; % to store max values for each iteration of the while loop

while ~feof(solarmax)    

    % read N=10 rows from the file
    C = textscan(solarmax,formatSpec, N, 'HeaderLines',2,'Delimiter','\t');

    % get Np and Tp column as row vectors
    Np = cell2mat(C(:,2)); 
    Tp = cell2mat(C(:,3)); 

    % chage -9999.90039 to NaN
    Np(Np == -9999.90039) = NaN;
    Np(Np == -9999.90039) = NaN;

    % calculate min or max values for each set ot N=10 values as you
    % did. Probably need to store them, so do this:        
    minNp(end+1) = min(Np);
    maxNp(end+1) = max(Np);        

    % the same do for Tp.
end
fclose(solarmax);

每次迭代循环得到一个最小值和最大值：

minNp =

    3.2740    3.2740    3.2740


maxNp =

    3.4240    3.4240    3.4240

大文本文件中的列向量数组（数值） - 初始化，替换坏单元格

1 个答案: