我有一堆需要清理的文本文件。我使用UNIX bash,所以AWK或grep很好。
文本文件看起来像这样:
1766 1789
1764 1790
1762 1849
0
1357 1817
1366 1857
0
360 42
352 95
0
293 142
302 181
delete-this
0
302 181
0
我想要的是删除所有行“0”,“delete-this”,只有一行有两列或三行有两列。
结果应如下所示:
1766 1789
1762 1849
1357 1817
1366 1857
360 42
352 95
293 142
302 181
非常感谢!
更多信息:第1行第2列和第2列第2列的总和应为> 1,否则,必须删除第2行。
答案 0 :(得分:2)
这很难,或者很难理解,但我们再来一次:
awk '/[0-9]+ [0-9]+/ {a[++t]=$0;b[t]=$2;next} {if (t>=2) for (i=1;i<=t;i++) {if (b[i]-c!=1) print a[i];c=b[i]};t=0}'
1766 1789
1762 1849
1357 1817
1366 1857
360 42
352 95
293 142
302 181
它是如何运作的:
awk '
/[0-9]+ [0-9]+/ { # if line does have 2 column of number, then
a[++t]=$0 # add line to array "a" and increment variable "t"
b[t]=$2 # add column 2 to array "b"
next # go to next line
}
{
if (t>=2) # is there more two or more lines with numbers connrected, then
for (i=1;i<=t;i++) { # loop trough array "a" with all numbers
if (b[i]-c!=1) # test if the difference between this number in column 2 is more than 1 compare to previous line
print a[i] # then print array "a"
c=b[i] # store array "b" information in variable "b"
}
;t=0 # clear counter "t"
}' file