我有一个带有分类分配的txt文件,如:
#name_file
Bacteria;WS3;PRR-12;SSS58A 0.0 0.12 0.6
Bacteria;WS3;PRR-12;Sediment-1 0.5 0.1 0.3
Bacteria;Terrabacteria_group;Firmicutes;Bacilli; unclassified_Bacillales;Bacillaceae;Vulcanibacillu 0.2 0.2 0.6
Bacteria;Terrabacteria_group;Firmicutes;Bacilli;Bacillales;Bacillaceae;Vulcanibacillu 0.2 0.2 0.6
Bacteria;Terrabacteria_group;Firmicutes;Bacilli;Bacillales;Bacillales_incertae_sedis;Bacillales_Family_X 0.1 0.3 0.5
Bacteria;Terrabacteria_group;Firmicutes;Bacilli;Bacillales;Bacillales_incertae_sedis;Bacillales_Family_X._Incertae_Sedis;Thermicanus 0.4 0.13 0.9
Bacteria;Nitrospirae;Nitrospira;Nitrospirales;Thermodesulfovibrionaceae 0.1 0.2 0.6
Bacteria;Nitrospirae;Nitrospira;Nitrospirales;Thermodesulfovibrionaceae;BD2-6 0.0 0.0 0.6
Bacteria;PVC_group;Lentisphaerae;Lentisphaeria;Lentisphaerales 0.7 0.2 0.1
所以我想提取第一个和第二个(仅当第二个完成ales_incertae_sedis)与每行中匹配“ales”的单词时,打印OUT将如下:
Bacillales
Bacillales;Bacillales_incertae_sedis
Bacillales;Bacillales_incertae_sedis
Nitrospirales
Nitrospirales
Lentisphaerales
但不是第三个:
Bacillales;Bacillales_incertae_sedis;Bacillales_Family
我试过了:
use strict;
use warnings;
use Getopt::Long;
GetOptions (
'i=s' =>\$infile,
);
open INFILE, '<', "$infile", or die "cant open file $infile";
open OUTFILE, '>', "$results.txt" or die "cant open";
while ( <INFILE>) {
my $line = $_;
chomp($line);
if ($line=~ m/^#/g) {
next;
}
elsif ($line=~ m/^$/g){
next;
}
elsif($line){
my @taxonomic=$_;
foreach (@taxonomic){
($taxon, $val1, $val2, $val3) = split(/\t/,$_);
}
#here is the problem
my (@orden) = ($taxon=~ m/(\w*ales)[\;]?/g);
foreach (@orden){
if ($_=~m/^$/g){
next;
}
elsif ($_=~ m/^unclassified/g){
next;
}
else {
print OUTFILE "$_\n";
}
}
}
}
close INFILE;
close OUTFILE;
exit;
我的问题是这一行:
my (@orden) = ($taxon=~ m/(\w*ales)[\;]?/g);
我试图选择倍数选项
my (@orden) = ($taxon=~ m/(\w*ales)[\;]?(;\w*ales_incertae_sedis)/g);
my (@orden) = ($taxon=~ m/(\w*ales[;\w*ales_incertae_sedis]?)[\;]?/g);
但它不起作用。
非常感谢
答案 0 :(得分:0)
试试这个
use warnings;
use strict;
my $m;
while ( <INFILE>>)
{
if($_=~/(?:([a-z]+ales;[^;]+;).+?family|(\w+ales;))/i )
{
$m = $1 || $2;
print "$m\n" if($m!~/^unc/)
}
}
在上面,我使用了非捕获组(?:)
有关非捕获组see this answer的更多信息