Question

我有以下格式的文件：

    Name Salary Age
    bob  10000  18
    sally 5555  20
   @not found 4fjfjhdfjfnvndf
   @not found 4fjfjhdfjfnvndf
   9/2-10/2

然后我在文件中有随机点，其中有4-6行随机字符。这些文件有200万行。我想知道infile语句是否会自动跳过这些随机突发的行，或者我是否必须进入文件并自动删除这些行。

Answer 1

您可能必须以某种方式处理它们。如果您在truncover声明中有missover或infile，则不会造成任何伤害（但您必须拥有一个，否则可能导致您的下一行被转移）。但是你的程序中有一个你需要处理的垃圾线。

快速而肮脏的方法将是这样的：

data have;
infile "blah.txt" dlm=' ' dsd lrecl=32767 truncover;
input name $ salary age;
if missing(salary) and missing(age) then delete;
run;

如果垃圾可能会为数字生成缺失值，那就行了。但是，您的日志中可能包含一些不太好的警告，如果垃圾可能是数值，那么它在找到的内容中并不完美。（如果它是完全数值，则可以测试name是否为数字。）

更好的方法是预处理_infile_ - 这有点'先进'，但肯定是一个好方法。

data have;
infile "blah.txt" dlm=' ' dsd lrecl=32767 truncover;
input @;
if countw(_infile_) ne 3 then delete;  *if there are not exactly 3 "words" then delete it;
if notdigit(scan(_infile_,2)) or notdigit(scan(_infile_,3)) then delete; *if the 2nd or 3rd word contain non-digit values then delete;
input name $ salary age;
run;

这两种方法都需要与数据保持一致，并且可能需要进行一些调整 - 例如，如果可以忽略工资和年龄，这两种方法都会删除您不想删除的行。

如何跳过文件中随机字符序列的某些行

1 个答案: