删除perl中的上一行和下一行

时间:2015-10-16 15:13:29

标签: perl

我有以下文件:

@TWEETY:150:000000000-ACFKE:1:2104:27858:17965
AAATTAGCAAAAAACAATAACAAAACTGGGAAAATGCAATTTAACAACGAAAATTTTCCGAGAACTTGAAAGCGTACGAAAACGATACGCTCC
+
D1FFFB11FDG00EE0FFFA1110FAA1F/ABA0FGHEGDFEEFGDBGGGGFEHBFDDG/FE/EGH1@GF@F0AEEEEFHGGFEFFCEC/>EE
@TWEETY:150:000000000-ACFKE:1:1105:22044:20029
AAAAAATATTAAAACTACGAATGCATAAATTATTTCGTTCGAAATAAACTCACACTCGTAACATTGAACTACGCGCTCC
+
CCFDDDFGGGGGGGGGGHGGHHHHGHHHHHHHHHHHHHHHGHHGHHHHHHHHHHHHHGHGHGGHHHHHHGHHEGGGGGG
@TWEETY:150:000000000-ACFKE:1:2113:14793:7182
TATATAAAGCGAGAGTAGAAACTTTTTAATTGACGCGGCGAGAAAGTATATAGCAACAAGCGAGCACCCGCTCC
+
BBFFFFFGGGGFFGGFGHHHHHHHHHHHHHHHHHGGAEEEAFGGGHHFEGHHGHHHHHGHHGGGGFHHGG?EEG
@TWEETY:150:000000000-ACFKE:1:2109:5013:22093
AAAAAAATAATTCATATCGCCATATCGACTGACAGATAATCTATCTATAATCATAACTTTTCCCTCGCTCC
+
DAFAADDGF1EAGG3EG3A00ECGDFFAEGFCHHCAGHBGEAGBFDEDGGHBGHGFGHHFHHHBDG?/FA/
@TWEETY:150:000000000-ACFKE:1:2106:25318:19875

+
CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

这些行是四个一组(每次都有一个名字,以@TWEETY开头,一串字母,一个+字符和另一串字母)。

第二行和第四行应具有相同的字符数。 但有些情况下第二行是空的,就像前四行一样。

在这些情况下,我想摆脱整个块(空行和下两行之前的前一行)。 我刚开始使用perl并且一直在尝试为我的问题编写脚本,但是我很难过。有没有人有一些反馈? 谢谢!

4 个答案:

答案 0 :(得分:2)

保留最后四行的数组缓冲区。当它已满时,检查第二行,是否打印行,清空缓冲区,重复。

#!/usr/bin/perl
use warnings;
use strict;

my @buffer;

sub output {
    print @buffer unless 1 == length $buffer[1];
    @buffer = ();
}

while (<>) {
    if (4 == @buffer) {
        output();
    }
    push @buffer, $_;
}
output();  # Don't forget to process the last four lines.

答案 1 :(得分:1)

是。首先查看$/并设置它,以便您可以一次处理一个块。我建议你可以在你的例子中将@视为记录分隔符。

然后使用while循环迭代您的记录。例如。 while ( <> ) {

使用\n上的拆分将当前块转换为行数组。

在相应的行上执行测试,print - 或不是 - 取决于是否通过。

如果您对此感到困惑,那么我确定包含您的代码以及您遇到问题的具体问题将在此处受到欢迎。

答案 2 :(得分:0)

如果正确分组数据,这几乎变得微不足道。

#!/usr/bin/perl

use strict;
use warnings;

# Use '@TWEETY' as the record separator to make it
# easy to chunk the data.
local $/ = '@TWEETY';

while (<DATA>) {
  # The first entry will be empty (as the separator
  # is the first thing in the file). Skip that record.
  next unless /\S/;

  # Skip any records with two consecutive newlines
  # (as they will be the ones with the empty line 2)
  next if /\n\n/;

  # Print the remaining records
  # (with $/ stuck back on the front)
  print "$/$_";
}

__DATA__
@TWEETY:150:000000000-ACFKE:1:2104:27858:17965
AAATTAGCAAAAAACAATAACAAAACTGGGAAAATGCAATTTAACAACGAAAATTTTCCGAGAACTTGAAAGCGTACGAAAACGATACGCTCC
+
D1FFFB11FDG00EE0FFFA1110FAA1F/ABA0FGHEGDFEEFGDBGGGGFEHBFDDG/FE/EGH1@GF@F0AEEEEFHGGFEFFCEC/>EE
@TWEETY:150:000000000-ACFKE:1:1105:22044:20029
AAAAAATATTAAAACTACGAATGCATAAATTATTTCGTTCGAAATAAACTCACACTCGTAACATTGAACTACGCGCTCC
+
CCFDDDFGGGGGGGGGGHGGHHHHGHHHHHHHHHHHHHHHGHHGHHHHHHHHHHHHHGHGHGGHHHHHHGHHEGGGGGG
@TWEETY:150:000000000-ACFKE:1:2113:14793:7182
TATATAAAGCGAGAGTAGAAACTTTTTAATTGACGCGGCGAGAAAGTATATAGCAACAAGCGAGCACCCGCTCC
+
BBFFFFFGGGGFFGGFGHHHHHHHHHHHHHHHHHGGAEEEAFGGGHHFEGHHGHHHHHGHHGGGGFHHGG?EEG
@TWEETY:150:000000000-ACFKE:1:2109:5013:22093
AAAAAAATAATTCATATCGCCATATCGACTGACAGATAATCTATCTATAATCATAACTTTTCCCTCGCTCC
+
DAFAADDGF1EAGG3EG3A00ECGDFFAEGFCHHCAGHBGEAGBFDEDGGHBGHGFGHHFHHHBDG?/FA/
@TWEETY:150:000000000-ACFKE:1:2106:25318:19875

+
CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

答案 3 :(得分:0)

感谢大家的反馈! 这一切都非常有用。感谢您的建议,我探索了所有选项并学习了除非声明。 给出我现有代码的最简单的解决方案就是在最后添加一个除非声明。

### Write to output, but remove non-desired Gs
open OUT, ">$outfile";

my @accorder = @{$store0{"accorder"}};
foreach my $acc (@accorder){
# retrieve seq(2nd line) and qual(4th line)
my $seq = $store0{$acc}{"seq"};
my $qual = $store0{$acc}{"qual"};

# clean out polyG at end
$seq =~ s/G{3,}.{0,1}$//;
my $lenseq = length($seq);
my $lenqual = length($qual);   
my $startqual = $lenqual - $lenseq;
$qual = substr($qual, 0, $lenseq); 

#the above was in order to remove multiple G characters at the end of the
#second line, which is what led to empty lines (lines that were made up of
#only Gs got cut out)

# print to output, unless sequence has become empty
unless($lenseq == 0){  #this is the unless statement I added
print OUT "\@$acc\n$seq\n+\n$qual\n";
}
}
close(OUT);