我有一个像这样的文件
a score=-120.0
s Chicken.chr22 947 4 + 4081097 tgag
s Turkey.chrZ 31560312 4 - 81011772 ttct
s Mallard.apl2 2559751 4 - 153042893 TTCG
a score=61344.0
s Chicken.chr22 951 15 + 4081097 c------tgggtgaagcactg
s Turkey.chrZ 31560316 15 - 81011772 t------tgggtaaggaactg
s Mallard.apl2 2559755 15 - 153042893 T------TGGGTTAGAAACTG
s Rock_pigeon.scaffold637 370291 15 + 418352 G------AGGGTCAGTTTCTG
s Common_cuckoo.scaffold569 739303 15 + 1009149 C------TGGGTTGAAAACTG
s Anna_s_hummingbird.scaffold44 3039342 15 - 10500161 C------TGGGTTAAACACTG
s Hoatzin.scaffold186 66281 15 + 155126 C------TGGATAAAGAACTG
s Emperor_penguin.Scaffold155 7152296 15 - 9595628 C------TGGGTAAAAAATTG
s Adelie_penguin.scaffold207 570235 15 - 3061884 C------TGGGTCAAAAACTG
s Crested_ibis.scaffold108 24271571 15 - 27015053 C------TGAGTAAAAACCTG
s Little_egret.scaffold238 365328 14 + 1015180 -------TGGGTTAAAAACTG
s Peregrine_falcon.scaffold41_1 3239034 14 - 3351735 -------TGGGTTAAAAGCTG
s Budgerigar.megascaffold18 4987476 14 + 17573940 -------TGGATAAAGAACTG
s Golden_collared_manakin.scaffold312 1652783 16 + 1993610 A-----CAGGGTTAGGAACTG
s Downy_woodpecker.scaffold1064 9341 21 - 117330 AGTGAGGTGGATTGTGAACTG
每个数据块都有第一行,以a
开头,其他行以s
开头。之后,一个空行将块分开。
不幸的是,每个块包含不同数量的s
行。
我想收集具有第一行的块(在具有相同格式的不同文件中)(以a
开头)并且s
行的数量将等于一个数字我将作为参数传递。
我编写了以下脚本,但它不起作用。有人可以帮我吗?
#!/usr/bin/perl
use strict;
#use warnings;
use POSIX;
my $maf = $ARGV[0];
my $species = $ARGV[1];
#It filters the maf file. takes the blocks with all the species
open my $maf_file, $maf or die "Could not open $maf: $!";
my $count = 0;
my @array;
while (my $mline = <$maf_file>) {
next if /^\s*#/; #to avoid some lines with comments
if ($mline =~ /^a/) {
push(@array, $mline);
}
if ($mline =~ /^s/) {
until ($mline != ~/\s/) {
push(@array, $mline);
$count += 1;
}
foreach (@array) {
if ($count == $species) {
print "$_\n";
}
}
undef(@array);
}
答案 0 :(得分:1)
如果您有一个以块为单位组织的文件,您通常可以通过一种允许您逐块处理文件的方式更改Perl的输入记录分隔符。这是一般草图。
# You should enable these.
use strict;
use warnings;
# Change the input record separator.
# You typically want to make this change within a subroutine or other narrowly
# scoped location within your program.
local $/ = "\n\n";
while (my $block = <>){
my @lines = split /\n/, $block;
# Do stuff with the lines in a block.
}
答案 1 :(得分:0)
你还没有真正提出问题,所以很难得到很多帮助。但是如果你只想将每个块放入一个单独的数组元素中,那就非常简单了。您只需将$/
设置为空字符串即可将Perl置于“段落模式”。
open my $maf_file, $maf or die "Could not open $maf: $!";
my @blocks;
{
local $/ = ''; # always localise changes to Perl's special variables
@blocks = <$maf_file>;
}
答案 2 :(得分:0)
我相信我已经解决了它,基于FMc的帮助。 非常感谢你!
#!/usr/bin/perl
use strict;
use POSIX;
my $maf = $ARGV[0];
my $species = $ARGV[1];
my $nline = 0;
if ($species == "" || $species == "0") {
$species = 1;
#print "Forching number of species to 1\n";
}
open (FILE, $maf) or die("foo");
local $/ = "\n\n";
while (<FILE>){
my @lines = split /\n/, <>;
my $arraySize = @lines;
foreach (@lines) {
if ($arraySize == $species +1 ) {
print "$_\n";
$nline = 1;
}
}
if ($nline == 1) {
print"\n";
$nline = 0;
}
}