Question

我有一个大约有4000万行的数据。我想从中提取50列字符串。我已将常规数据步骤与数组一起使用来执行任务，但是花了2个多小时才能完成提取。

我知道如何通过首先指定查找表来将SAS中的哈希表用于简单的连接或子集。但是，我更喜欢在这里使用正则表达式进行提取。当前提取使用的代码如下。

如何在没有查找表的情况下在SAS的那50列中进行哈希表搜索？

data want;
   set have;
   array cols {*} $ col1 - col50;

   do i = 1 to dim(cols)
      if prxmatch('/F[0-9].*[123]/', cols[i])
         then output;
   end;
run;

Answer 1

以正则表达式模式进行分组将设置用PRXPOSN检索匹配项所需的条件。匹配项可以存储在数据集处理结束时输出的哈希中。

data _null_;
   set have end=done;
   array cols {*} $ col1 - col50;

   rxid = prxparse('/(F[0-9].*[123])/');

   if _n_ = 1 then do;
     length match $200;
     declare hash matches();
     matches.defineKey(match);
     matches.defineDone();
   end;

   do i = 1 to dim(cols)
      if prxmatch(rxid, cols[i]) then do;
        match = prxposn (rxid, 1, cols[i]);
        matches.replace();
      end;
   end;

   if done then matches.output(dataset:'want_matches');
run;

哈希表搜索SAS中的多列

1 个答案: