Question

我需要删除上面的2行和从'Possible'开始的行下面的4行。这条线也应该被删除。我不习惯在终端工作，但似乎对于我想要的，下面的解决方案是最直接的。

问题是我的文件有超过70000行，grep似乎太多了：

$ grep -v "$(grep -E -a -B 2 -A 3 'Possible' structure)" structure >final
-bash: /bin/grep: Argument list too long

还有其他方法可以实现吗？输入文件的片段，包含要删除的部分：

gi|41|gb|JH9|.1(59-594) Length: 73 bp
Type: Glu   Anticodon: CTC at 33-35 (59424-59426)   Score: 22.64
Possible pseudogene:  HMM Sc=43.51  Sec struct Sc=-20.87
         *    |    *    |    *    |    *    |    *    |    *    |    *    |  
Seq: GCCCGTTTGGCTCAGTGGAtAGAGCATCGGCCCTCAgACCGTAGGGtCCTGGGTTCAGTTCTGGTCAAGGGCA
Str: >>>>.>...>>>>........<<<<.>>>>........<<<.<......>.>>.......<<.<..<.<<<<.

Answer 1

问题是我的文件有超过70000行，而且似乎是对于grep太多了：

不，事实是grep -E -a -B 2 -A 3 'Possible' structure扩展为导致参数列表过大的东西。您可以改为使用流程替换：

grep -v -f <(grep -E -a -B 2 -A 3 'Possible' structure) structure >final

Answer 2

我认为你应该将你的命令分成两个阶段。在第一阶段，您选择您不希望在输出中看到的字符串（内部grep）并将结果保存到文件中。在第二阶段，使用-f grep标志检查输入（-f允许在文件而不是命令行中指定模式。）

Answer 3

我认为你不能用grep做到这一点。我建议改为awk。

#!/usr/bin/awk -f

{
  # Record the current line in an array
  line[NR]=$0;
}

# If we saw "Possible" 3 lines ago, remove the last 5 lines from the array
(NR-3) in line && line[NR-3]~/Possible/ {
  for (i=5;i;i--) {
    delete line[NR-i];
  }
}

# Print the last 5th line if it's still in the buffer, then remove it to save memory
(NR-5) in line {
  print line[NR-5];
  delete line[NR-5];
}

# And print anything remaining in the buffer
END {
  for (i=NR-4;i<=NR;i++) {
    if (i in line) {
      print line[i];
    }
  }
}

顶部有“shebang”，你可以把它作为一个独立的脚本。或者，如果你真的想要，你可以将它们全部压缩到一个命令行上。

因为我们使用5行滑动窗口运行您的输入数据，处理任何长度的数据集应该不是问题 - 70000行，700万行，等等。

Answer 4

您可以尝试此sed，

sed 'N;/^[^\n]*\n[^\n]*$/N; /.*\n.*\n.*Possible/{$q;N;N;N;d};P;D;' structure > final

grep：参数列表太长了

4 个答案: