下午好,我正在尝试使用perl6.i计算字母A C T G在DNA序列中出现的次数我试过其他方式我只是 试图以另一种方式完成它。以下是我提出的一些代码
use v6;
my $default-input = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC";
sub MAIN(Str $input = $default-input)
{
say "{bag($input.comb)<A C G T>}";
}
use v6;
my $default-input = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC";
sub MAIN($input = $default-input)
{
"{<A C G T>.map({ +$input.comb(/$_/) })}".say;
示例数据集
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
答案 0 :(得分:7)
multi sub MAIN ( \DNA ) {
my Int %bag = A => 0, C => 0, G => 0, T => 0;
# doesn't keep the whole thing in memory
# like .comb.Bag would have
for DNA.comb {
%bag{$_}++
}
.say for %bag<A C G T> :p;
}
multi sub MAIN ( 'example' ){
samewith "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
}
multi sub MAIN ( Bool :STDIN($)! ){
samewith $*IN
}
multi sub MAIN ( Str :filename(:$file)! where .IO.f ){
samewith $file.IO
}
~$ ./test.p6
Usage:
./test.p6 <DNA>
./test.p6 example
./test.p6 --STDIN
./test.p6 --filename|--file=<Str>
~$ ./test.p6 example
A => 20
C => 12
G => 17
T => 21
~$ ./test.p6 --STDIN < test.in
A => 20
C => 12
G => 17
T => 21
~$ ./test.p6 --file=test.in
A => 20
C => 12
G => 17
T => 21
答案 1 :(得分:3)
另一种方法是使用BioInfo modules I'm working on已经为你强制包装:)
use v6;
use BioInfo;
my @sequences = `
>seqid
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
`;
for @sequences -> $seq {
say $seq.Bag;
}
在上面的代码中,您将导入一个特殊的生物信息学Slang,它理解``之间的字符串文字是FASTA文字。自动检测DNA / RNA /氨基酸,您就可以获得特定的类别。该对象有自己的.Bag可以做你想要的。除了我自己的模块,还有BioPerl6项目。
如果您想从文件中读取,那么以下内容适合您:
use v6;
use BioInfo::Parser::FASTA;
use BioInfo::IO::FileParser;
#Spawn an IO thread that parses the file and creates BioInfo::Seq objects on .get
my $seq_file = BioInfo::IO::FileParser.new(file => 'myseqs.fa', parser => BioInfo::Parser::FASTA);
#Print the residue counts per file
while my $seq = $seq_file.get() {
say $seq.Bag;
}