我有一个非常大的文本文件,主要包含带有标题(字符串)的数字数据。它有13列和49,000行。所有单元格都包含一个数值(没有空单元格,没有具有不同行数的列)。它是关于从卫星获取的太阳风的数据。它看起来像这样:
year,day,hr,min, sec, Np, Tp, Vx_gsm, Vy_gsm, Vz_gsm, Bx_gsm.....
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 .....
现在,这些列实际上在文本文件中完全对齐,但我不能在这里包含所有13列,但现在这并不重要。只要有" -9999.90039",这意味着该值由于某种原因而被破坏。我需要做的是更换" -9999.90039"使用NaN的值,因此它们不会包含在任何计算中。我需要为Np和Tp(现在)创建单元格数据列的数组,将它们初始化为零,并在每个中找到最小值和最大值。这是一个大文件,所以我想我需要在块中执行此操作。此外,因为我只需要访问1列数字单元格中的值(每次计算),所以我不需要改变很多并使用文本扫描。这是我到目前为止所得到的:
N = 10;% block size will have to be bigger but I wanted to test first
solarmax = fopen('ACE_magswe_64sec_2000.txt','r');
formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f %f %f %f';
% my prof said I need to initialize my variables by setting them to zero so
% I put this line below but I don't think it's right
k = 0;% not sure if this is necessary
while ~feof(solarmax)
k = k+1;% not sure if this is necessary
C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
function y = changeval(num)
if (num('-9999.90039',num))
y = 'NaN';
end
end
fclose(solarmax);
Np = C{1,6};% not sure what to put here to call all values in that column
min(Np)
max(Np)
Tp = C{1,7};% same problem here
min(Tp)
max(Tp)
因此,我在FormatSpec中忽略了那些我想要忽略的列旁边的星号,并使用HeaderLines忽略前两行。之后,我对我应该如何设置这个问题感到困惑(我之前在编程方面的唯一体验是2006年的C ++!)请帮助!!
已更新
Marcin给了我一些很好的建议,但我仍然有一些我需要帮助的问题。这是我的代码,现在:
N = 1000;
solarmax = fopen('ACE_magswe_64sec_2000.txt','r+');
formatSpec = '%*d %*d %*d %*d %*f %f %f %f %f %f %f %f %f';
minNp = [];
maxNp = [];
minTp = [];
maxTp = [];
while ~feof(solarmax)
C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
Np = cell2mat(C(:,1));
Tp = cell2mat(C(:,2));
Np(Np == -9999.90039) = NaN;
Tp(Tp == -9999.90039) = NaN;
minNp(end+1) = nanmin(Np);
maxNp(end+1) = nanmax(Np);
minTp(end+1) = nanmin(Tp);
maxTp(end+1) = nanmax(Tp);
end
fclose(solarmax);
nanmin(Np);
nanmax(Np);
nanmin(Tp);
nanmax(Tp);
在最后编译和删除分号时,我得到min和max函数的值,结果是所有的值都是NaN!我认为min和max命令已经忽略了NaN,所以我查了一下并建议使用nanmin / nanmax。然而,这产生了相同的结果。还有什么我想念的吗?
答案 0 :(得分:0)
也许这会有所帮助。
基于您的问题的示例数据。我做得更长,所以要超过10行。
year,day,hr,min, sec, Np, Tp, Vx_gsm, Vy_gsm, Vz_gsm
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -17.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 49.72259 -67.01060
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -267.01060
2000 1 0 1 47.0496 3.42400 -99.90039 -9999.90039 49.72259 -167.01060
根据示例数据和您的描述修改代码:
N = 10;% block size will have to be bigger but I wanted to test first
solarmax = fopen('ACE_magswe_64sec_2000.txt','r');
formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f';
minNp = []; % to store min values for each iteration of the while loop
maxNp = []; % to store max values for each iteration of the while loop
while ~feof(solarmax)
% read N=10 rows from the file
C = textscan(solarmax,formatSpec, N, 'HeaderLines',2,'Delimiter','\t');
% get Np and Tp column as row vectors
Np = cell2mat(C(:,2));
Tp = cell2mat(C(:,3));
% chage -9999.90039 to NaN
Np(Np == -9999.90039) = NaN;
Np(Np == -9999.90039) = NaN;
% calculate min or max values for each set ot N=10 values as you
% did. Probably need to store them, so do this:
minNp(end+1) = min(Np);
maxNp(end+1) = max(Np);
% the same do for Tp.
end
fclose(solarmax);
每次迭代循环得到一个最小值和最大值:
minNp =
3.2740 3.2740 3.2740
maxNp =
3.4240 3.4240 3.4240