我有一个聊天数据,我想在当时的一个条目中阅读。每次一个人打“发送”应该是一个观察。问题是文本中有中断(输入)。我无法让SAS继续阅读这个观点。这是一些虚拟数据:
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
我希望这是5次观察,但我只能管理SAS将其视为7个障碍物。所需的数据集应如下所示:
Obs VAR1
1 08:23 - Greg: Hi!
2 08:24 - Sue: Hello
3 08:24 - Greg: How are you?
4 08:25 - Sue: Just fine :) How are you then?
5 08:26 - Greg: All good.
我玩代码:
data testing;
infile datalines ;
input var1 $60. ;
datalines;
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
;
但实际文件是一个txt,并且比上面的虚拟示例有更多的不规则性。我试图使用尾随@但不能让它以我想要的方式工作。也许尾随@不是我追求的。有什么建议怎么办?
答案 0 :(得分:1)
试试这个。
保留一个最后一个值的运行变量。如果当前值的前4个字符中有时间戳,则输出该值并将值重置为“”。将当前值附加到运行变量。最后,输出最后一行,无论如何。
data testing(keep=line);
set testing end=last;
format line $2000.;
retain line;
if _n_ > 1 then do;
if index(substr(var1,1,4),":") then do;
output;
line = "";
end;
end;
put line= var1=;
line = catx(" ",line , var1);
put line=;
if last then do;
output;
put "AT LAST";
end;
run;
答案 1 :(得分:0)
我无意中尝试在行数据输入中找到解决方案,无论如何我希望这对你有用,后期处理字符串:
data testing;
infile datalines ;
input var1 $60.;
datalines;
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
;
data testing01;
set testing;
retain row 0;
if input(substr(var1,1,2),8.) le 24 and input(substr(var1,1,2),8.) ne .
and substr(var1,3,1)=':'
and input(substr(var1,4,2),8.) le 59 and input(substr(var1,4,2),8.) ne . then row = row+1; else row=row;
run;
proc transpose data=testing01 out=testing02;
var var1;
by row;
run;
data testing03;
length final $2000;
set testing02;
array str[*] col:;
do i=1 to dim(str);
if str[i] ne '' then final=cats(strip(final)||' '||strip(str[i]));
end;
drop col: row i _name_;
run;
答案 2 :(得分:0)
filename FT15F001 temp;
data testing ;
infile FT15F001 end=eof ;
length string $6323;
retain string;
input @;
if _n_=1 then string=_infile_;
else if not missing(_infile_) and anydigit(_infile_)^=1 then string=catx(' ',string,_infile_);
else if not missing(_infile_) and anydigit(_infile_)=1 then do;
output;
call missing(string);
string=_infile_;
end;
if eof then output;
PARMCARDS;
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
;
答案 3 :(得分:0)
根据您的具体使用情况,有很多方法可以做到这一点。
这是一个正则表达式。如果你有>这将不起作用。总共32767个字符,除非你有办法将它分成块,但对于较小的文件效果很好;即使你一次读一行,也可以使用一般方法。
data test;
infile "c:\temp\chat.txt" recfm=f lrecl=32767;
input @;
rx_find = prxparse('~(\d\d:\d\d -.*?)(?=(?:\b\d\d:\d\d)|$)~ios');
rc_find = prxmatch(rx_find,_infile_);
pos=1;
pos2=0;
start=1;
call prxposn(rx_find,1,pos,len);
do until (pos2=0);
call prxposn(rx_find,1,pos,len);
found=substr(_infile_,pos,len);
output;
start=pos+len;
call prxnext(rx_find,start,-1,_infile_,pos2,len2);
end;
stop;
run;