Question

我有一个数组t，它指定我想从file.txt中读取的行数。所以我的代码应该是这样的：

data a;
   do i = 1 to dim(t);
      infile "C:\sas\file.txt" firstobs = t(i) obs = t(i);
      input x1-x10;
      output;
   end;
run;

当然，此解决方案（firstobs）仅在列数为常数时才有效。如何使用数组（也可以从同一个文件中读取 - 从第一行开始）？

例如，如果file.txt看起来像这样：

2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6

然后我希望输出为：

2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6

Answer 1

听起来第一行包含要保留的行列表。从单独的文件中读取它可能更容易，但您可以使用单个文件。您没有提到如何知道数据列数或第一行中可能存在的最大行数。现在让我们假设您可以在宏变量中设置这些数字。

让我们将您的示例数据放入文件中：

options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;

现在让我们将其读入数据集。

%let ncol=9 ;
%let maxrows=1000;
data want ;
  infile tempdata truncover ;
  array rows (&maxrows) _temporary_;
  if _n_=1 then do i=1 by 1 until (rows(i)=.);
    input rows(i) @;
    drop i;
  end;
  else do;
    input x1-x&ncol;
    if whichn(_n_,of rows(*)) then output;
  end;
run;

如果文件的其他行具有无效数据，使得INPUT语句会导致错误，则可以跳过尝试从ELSE块中稍作修改的那些行读取数据。

  else do;
    input @;
    if whichn(_n_,of rows(*)) then do;
      input x1-x&ncol;
      output;
    end;
  end;

如果您发现您经常不想在文件末尾读取大量记录，则可以将此行添加到数据步骤的末尾，以便在读取所需的最后一行后停止。

  if _n_ > max(of rows(*)) then stop;

Answer 2

这是一个类似汤姆的答案，但不会尝试读取路径外数据。对于跳过的行的数据格式与路径数据的格式不同的情况，这可能会更好。它使用Tom的parmcards和结构，因此您可以更轻松地看到差异。

options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;

%let ncol=9 ;
%let maxrows=1000;
data want ;
  infile tempdata truncover end=eof;
  array rows (&maxrows) _temporary_;
  do i=1 by 1 until (rows(i)=.);  *read in first line, just like Toms answer;
    input rows(i) @;
    drop i;
  end;
  input ;  * stop inputting on the first line;
           * Here you may need to use CALL SORTN to sort row array if it is not already sorted;
  _currow = 2;              * current row indicator;
  do _i = 1 to dim(rows);   * iterate through row array;
    if rows[_i]=. then leave; * leave if row array is empty;
    do while (_currow lt rows[_i] and not eof);  * skip rows not in row array;
      input;
      _currow = _currow + 1;
    end;
    input x1-x&ncol;    * now you know you are on a desired row, so input it;
    output;             * and output it;
    _currow = _currow + 1;
  end;
run;

如上所述，您可能必须使用CALL SORTN，如果数组尚未排序（即，如果缺失不在最后且数字无序）。

Answer 3

如果您的文件是结构化的（即相同的分隔符/一个连续的＆＃39;行＆＃39;输入数据），则下面的方法应该有效。我确信你可以调整一下以提高效率，但我会提出一些意见来解释每个部分正在做什么。我还建议阅读infile文档，了解_infile_自动变量的解释以及操作输入数据缓冲区的其他方法。此外，如果您的输入数据文件本身需要拆分为单独的行，那么您需要对其进行调整。

filename in_data 'C:\sas\file.txt';

data out_data (keep=x1-x10);
    infile in_data;
    input fn;

    /*get the number of vars based on delimiter*/
    count = count(strip(_infile_), ' ') + 1;

    /*iterate through vars*/
    do i =1 to count;

        /*set new value to current var*/
        rec = scan(strip(_infile_), i, ' ');

        /*set array values to new value*/
        array obs(10) x1-x10;
            do j=1 to dim(obs); 
                obs(j) = rec;
        end;

        /*output to dataset*/
        output out_data;
    end;
run;

输入

2 4 6 7 8 9 10 11 2 3

输出

x1  x2  x3  x4  x5  x6  x7  x8  x9  x10
2   2   2   2   2   2   2   2   2   2
4   4   4   4   4   4   4   4   4   4
6   6   6   6   6   6   6   6   6   6
7   7   7   7   7   7   7   7   7   7
8   8   8   8   8   8   8   8   8   8
9   9   9   9   9   9   9   9   9   9
10  10  10  10  10  10  10  10  10  10
11  11  11  11  11  11  11  11  11  11
2   2   2   2   2   2   2   2   2   2
3   3   3   3   3   3   3   3   3   3

希望这会有所帮助。

Answer 4

好的，我明白了。假设我知道列数（10）和行数（10）我可以使用以下代码得到我想要的内容：

data a;
     w=1;
     infile "C:\sas\file.txt" n=10;
     input #w x1-x10;
     array x(*) x1-x10;
     array t(10) _temporary_;
     do i=1 to 10;
         if(x(i)^=.) then t(i)=x(i);
         else leave;
     end;
     do j=1 to i-1;
         w=t(j);
         input #w x1-x10;
         output;
     end;
     stop;
run;

剩下的就是在不知道行数和列数的情况下做同样的事情。这样我只读取我感兴趣的行，而不是读取所有行，只输出我需要的行。

Answer 5

如果您只是将整个矩阵读入数据集然后使用行号来选择所需的数据，那么维护程序可能要容易得多。您的文件可能需要在节省的时间内保存数十万个观察值，以避免阅读完整文件。

这是使用SET语句的POINT =选项选择行的一种方法。

options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;

data rows;
  infile tempdata obs=1 ;
  input row @@;
  row=row-1;
run;
proc import datafile="%sysfunc(pathname(tempdata))" dbms=dlm out=full replace;
   getnames=no;
   delimiter=' ';
   datarow=2;
run;
data want ;
  set rows ;
  pointer=row ;
  set full point=pointer ;
run;
proc print; run;

输入一个数字在变量中的行

5 个答案:

输入

输出