Question

使用SAS 9.3我想在“。”之间提取每个字符串的部分。（点）和点后面的“”（双引号）。例如，下面第一行的结果应为f2015_cnt_cont_line

      <characteristic abc="[2015].f2015_cnt_cont_line" xxxxxxxx="8129" />
      <characteristic abc="[2015].f2015_dbt_cont_line" xxxxxxxx="8134" />
      <characteristic abc="[2015].f2015_ctl_tot_acct_bal" xxxxxxxx="8133" />
      <characteristic abc="[2015].f2015_cnt_comb_line" xxxxxxxx="8118" />
      <characteristic abc="[2015].f2015_dbt_comb_line" xxxxxxxx="8138" />

有没有人有我可以使用的例子？

由于丹

Answer 1

与您的模式匹配的正则表达式为\.(.*?)\"。这意味着：找到一个点（一个特殊字符;然后是任何字符（？使它不是“贪婪”，所以它捕获尽可能少的字符）;然后是一个引号。

使用SAS文档中的this example，这样的事情应该有效：

data test;
   set _your_data_set;
   retain re;
   if _N_ = 1 then re = prxparse('/\.(.*?)\"/');
   if prxmatch(re, var) then result = prxposn(re, 1, var);
run;

（这假设您的数据位于名为var的变量中。）

Answer 2

而不是使用PRX功能，以下内容可能就足够了：

text=scan(scan(line,2,"."),1,'"');

这假定文本存储在名为line的变量中。

Answer 3

这是一种方式：

inner = SCAN(SUBSTR(line,INDEX(line,'.')+1),1,'"');

内部SUBSTR功能跳到第一个点后的列;外部SCAN函数返回由双引号分隔的第一个单词。

Answer 4

对我有用

SUBSTR(
t1.field, 
index(t1.field,'.')+1,
(index(t1.field,'"')-index(t1.field,'.')-1)
)

使用PRXPARSE和PRXSUBSTR提取字符串的片段

4 个答案: