Question

我有一个制表符分隔的文本文件。我把它分成了几列。前两列中的每一列都包含一个ID号。

我想保留ID号以P或Q开头的所有行，并删除列1或2中有任何其他ID或为空的任何其他行。

例如。所以要保留的列将是这样的：P12345或Q12345。要摆脱的列将是GAG123，CH123等，或者只是空白。

我无法弄清楚如何做到这一点。我已经尝试将行拆分为数组和grep / ^ [PQ] / elements [0]和[1]以及其他各种各样的东西，但我一定做错了。

我已尝试过以下TLP代码，但它不起作用，我知道我必须做一些根本错误的事情：

#!/usr/bin/perl  

use warnings;
use strict;

open(FILE,"<myfile.txt"); 
my @LINES = <FILE>; 
open(my $outfile, '>', 'changedtxt');
my @wanted;

while (<FILE>) {
    my @fields = split('\t', $_);
    if ( $fields[0] =~ /^[PQ]/ and $fields[1] =~ /^[PQ]/ ) {
        push @wanted, $_;  
        print {$outfile} $_;    
    }
}
exit:

Answer 1

您可以使用awk打印出第一个或第二个字段以P或Q开头的记录：

awk -F'\t' '$1~/[PQ].*/ || $2~/[PQ].*/ {print}'  file

Answer 2

use strict;
use warnings;

my @wanted;
while (<$fh>) {
    my @fields = split /\t/, $_;
    if ( $fields[0] =~ /^[PQ]/ or $fields[1] =~ /^[PQ]/ ) {
        push @wanted, $_;
    }
}

如果您希望这两个ID都以P或Q开头，请与or交换and。

如果您只是想将想要的行移动到另一个文件，只需执行以下操作：

perl -wnae 'print if (($F[0] =~ /^[PQ]/) or ($F[1] =~ /^[PQ]/))' input.txt > output.txt

或者作为脚本，使用script.pl input.txt > output.txt：

use warnings;
use strict;

while (<>) {
    my @fields = split(/\t/, $_);
    print if ( $fields[0] =~ /^[PQ]/ and $fields[1] =~ /^[PQ]/ );

}

请注意，您不能将'\t'用作拆分模式。

Answer 3

您也可以在一行中完成：

cat yourfile.txt | perl -e 'while (<>) { print if m/^[PQ]/xmsi && m/\t+[PQ]/xmsi }

删除文件中特定列中没有指定模式的行

3 个答案: