Question

我的数据格式如下：

#@ <id_wxyz_1>
A line written after this.

#@ <id_123>
A line written after this one also.

#@ <id_wxyz_2>
One more line.

#@ <id_yex_9>
Another line.

现在我要删除2行：＃@＆lt; ...＆gt;中包含“wxyz”的行及其后续行。我想要的示例输出是：

#@ <id_123>
A line written after this one also.

#@ <id_yex_9>
Another line.

是否有一些linux命令也可以实现相同或者是否有一些有效的方法在python中实现相同。我知道我可以使用grep，sed等选择性地删除一行。但是可以使用linux命令选择性地删除2行吗

编辑：给出的答案非常好，但它们不适用于以下形式的输入：

#@ <id_wxyz_1>
A line written after this.

#@ <id_wxyz_2>
A line written after this.

#@ <id_wxyz_3>
A line written after this.

#@ <id_wxyz_4>
A line written after this.

#@ <id_wxyzadded5>
A line written after this.

对于上面的输入，我应该没有输出行。

再次编辑：我有另一组输入：

#@ <id_wxyz0>
Line 1.
#@ <id_wxyz1>
line 2.
#@ <id_wxyz2> 
line 3.
#@ <id_wxyz3> 
line 4.
#@ <id_6>
line 5.

输出应为

#@ <id_6>
line 5.

Answer 1

您可以使用sed by来执行此操作，例如

/^#@ <.*wxyz.*>/ {
   N        #Add the next line to the pattern space
   s/.*//   #clear the line
   N        #Read another line
   /^\n$/ d #if line was blank, delete and start next cycle (reading again)
   D        #Otherwise, delete up to newline, and start next cycle with that

}

注意：对于第二种情况，它实际上仍然输出一个空行

Answer 2

您也可以使用grep。

示例：给出您的输入

$ cat t
#@ <id_wxyz_1>
A line written after this.

#@ <id_123>
A line written after this one also.

#@ <id_wxyz_2>
One more line.

#@ <id_yex_9>
Another line.

#@ <id_wxyz_1>
A line written after this.

#@ <id_wxyz_2>
A line written after this.

#@ <id_wxyz_3>
A line written after this.

#@ <id_wxyz_4>
A line written after this.

#@ <id_wxyzadded5>
A line written after this.

#@ <id_wxyz0>
Line 1.
#@ <id_wxyz1>
line 2.
#@ <id_wxyz2> 
line 3.
#@ <id_wxyz3> 
line 4.
#@ <id_6>
line 5.

你可以运行

$ grep -A1  --group-separator=""  -P '#[^_]*((?!wxyz).)*$' t
#@ <id_123>
A line written after this one also.

#@ <id_yex_9>
Another line.

#@ <id_6>
line 5.

regexp匹配以#开头且不包含wxyz的行，具有类似Perl的语法（因此为-P参数）。 -A1在匹配后将一行添加到输出。在使用--group-separator=""（或--或-A）选项时，未记录的B选项会替换通常用于分隔行组的默认-C。请注意，此后一个选项并非适用于所有实现。

Answer 3

使用awk你可以说：

awk '/^#@ <.*wxyz.*>/{getline;getline}1' filename

编辑：根据您修改过的问题，您可以说：

sed '/^#@ <id_wxyz.*/,/^$/d' filename

Answer 4

您也可以使用awk。如果与该行匹配，请对以下两行使用getline两次，并使用next以避免打印它们。

awk '/^#@[[:blank:]]+<.*wxyz.*>/ { getline; getline; next } { print }' infile

它产生：

#@ <id_123>
A line written after this one also.

#@ <id_yex_9>
Another line.

更新为新版 OP 提供解决方案：

awk  '
    BEGIN { RS = "#@" } 
    $1 ~ /[^[:space:]]/ && $1 !~ /<.*wxyz.*>/ { 
        sub(/\n[[:blank:]]*$/, "")
        print RS, $0 
    }
' infile

在上一个例子中，它产生了：

#@  <id_6>
line 5.

删除2行连续

4 个答案: