我需要一个Perl脚本来连接该行..
我有1000多个基因名称(> pmpI)及其功能(多态性外膜蛋白),它在单独的行中,我希望加入基因名称附近的基因功能,这样就可以了将来很容易想象并保存以供进一步参考。
例如:文件内容如下所示
>pmpG
polymorphic outer membrane protein
>pmpH
polymorphic outer membrane protein
>CTA_0953
hypothetical protein
>pmpI
polymorphic outer membrane protein
我尝试手动在Excel中手动执行,但是对于很多文件都不可能,所以我想从程序员那里得到帮助..
我需要Perl脚本来连接行
程序输出应该是这样的:
>pmpG polymorphic outer membrane protein
>pmpH polymorphic outer membrane protein
>CTA_0953 hypothetical protein
>pmpI polymorphic outer membrane protein
答案 0 :(得分:3)
作为单行命令,这将是
perl -n -e 's/^\s+//; s/\s+$//; next unless $_ ne ""; if (/^[>]/) { $n = $_; } else { printf "%-11s%s\n", $n, $_; }' < data.txt
为了澄清,当放入perl程序时,它看起来像:
#!/usr/bin/perl
while (<>) { # iterate over all lines
s/^\s+//; # remove whitespace at the beginning...
s/\s+$//; # ...and the end of the line
next unless $_ ne ""; # ignore empty lines
if (/^[>]/) { $n = $_; } # if line starts with >, remember it
else { printf "%-11s%s\n", $n, $_; # otherwise output the remembered
} # content and the current line
这会接受您的内容作为输入,因此会使用perl program.pl < data.txt
调用。
预计内容将包含在data.txt
中;将其修改为您的实际文件名。
答案 1 :(得分:0)
有一些解释性评论......
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
# Store the current line
my $line;
while (<DATA>) {
# Remove the newline
chomp;
# If the line starts with '>'
if (/^>/) {
# Output the current $line
# (if we have one)
say $line if $line;
# Set $line to this line
$line = $_;
} else {
# Append this line to $line
$line .= "\t$_";
}
}
# Output the current line
say $line;
__DATA__
>pmpG
polymorphic outer membrane protein
>pmpH
polymorphic outer membrane protein
>CTA_0953
hypothetical protein
>pmpI
polymorphic outer membrane protein