我在修复代码中的错误时遇到问题。我正在尝试获取代码来读取输入文件,并仅提取[]
之间的内容。但是,我得到的错误是readline() on unopened filehandle
...我不确定我在这里对while ()
文件句柄做错了什么。
#!/usr/bin/perl
use warnings;
my $file = '';
my $newfile = '';
open($newfile, '>', 'newmyosin.fasta') or die "Can't create file", $!;
open($file, '<', 'myosin.fasta') or die "Can't open file", $!;
while(<$file>) {
print;
chomp;
if ( $_ =~ /\[(.+)\]/ ) {
$file = $1;
}
}
所以,例如:
这将是我的输入文件的一部分:
>gi|115527082|ref|NP_005954.3| myosin-1 [Homo sapiens]
>gi|226694176|sp|P12882.3|MYH1_HUMAN RecName: Full=Myosin-1; AltName: Full=Myosin heavy chain 1; AltName: Full=Myosin heavy chain 2x; Short=MyHC-2x; AltName: Full=Myosin heavy chain IIx/d; Short=MyHC-IIx/d; AltName: Full=Myosin heavy chain, skeletal muscle, adult 1 [Homo sapiens]
>gi|119610411|gb|EAW90005.1| hCG1986604, isoform CRA_b [Homo sapiens]
MSSDSEMAIFGEAAPFLRKSERERIEAQNKPFDAKTSVFVVDPKESFVKATVQSREGGKVTAKTEAGATVTVKDDQVFPM
NPPKYDKIEDMAMMTHLHEPAVLYNLKERYAAWMIYTYSGLFCVTVNPYKWLPVYNAEVVTAYRGKKRQEAPPHIFSISD
NAYQFMLTDRENQSILITGESGAGKTVNTKRVIQYFATIAVTGEKKKEEVTSGKMQGTLEDQIISANPLLEAFGNAKTVR
NDNSSRFGKFIRIHFGTTGKLASADIETYLLEKSRVTFQLKAERSYHIFYQIMSNKKPDLIEMLLITTNPYDYAFVSQGE
ITVPSIDDQEELMATDSAIEILGFTSDERVSIYKLTGAVMHYGNMKFKQKQREEQAEPDGTEVADKAAYLQNLNSADLLK
ALCYPRVKVGNEYVTKGQTVQQVYNAVGALAKAVYDKMFLWMVTRINQQLDTKQPRQYFIGVLDIAGFEIFDFNSLEQLC
INFTNEKLQQFFNHHMFVLEQEEYKKEGIEWTFIDFGMDLAACIELIEKPMGIFSILEEECMFPKATDTSFKNKLYEQHL
GKSNNFQKPKPAKGKPEAHFSLIHYAGTVDYNIAGWLDKNKDPLNETVVGLYQKSAMKTLALLFVGATGAEAEAGGGKKG
GKKKGSSFQTVSALFRENLNKLMTNLRSTHPHFVRCIIPNETKTPGAMEHELVLHQLRCNGVLEGIRICRKGFPSRILYA
DFKQRYKVLNASAIPEGQFIDSKKASEKLLGSIDIDHTQYKFGHTKVFFKAGLLGLLEEMRDEKLAQLITRTQAMCRGFL
ARVEYQKMVERRESIFCIQYNVRAFMNVKHWPWMKLYFKIKPLLKSAETEKEMANMKEEFEKTKEELAKTEAKRKELEEK
MVTLMQEKNDLQLQVQAEADSLADAEERCDQLIKTKIQLEAKIKEVTERAEDEEEINAELTAKKRKLEDECSELKKDIDD
LELTLAKVEKEKHATENKVKNLTEEMAGLDETIAKLTKEKKALQEAHQQTLDDLQAEEDKVNTLTKAKIKLEQQVDDLEG
SLEQEKKIRMDLERAKRKLEGDLKLAQESTMDIENDKQQLDEKLKKKEFEMSGLQSKIEDEQALGMQLQKKIKELQARIE
ELEEEIEAERASRAKAEKQRSDLSRELEEISERLEEAGGATSAQIEMNKKREAEFQKMRRDLEEATLQHEATAATLRKKH
ADSVAELGEQIDNLQRVKQKLEKEKSEMKMEIDDLASNMETVSKAKGNLEKMCRALEDQLSEIKTKEEEQQRLINDLTAQ
RARLQTESGEYSRQLDEKDTLVSQLSRGKQAFTQQIEELKRQLEEEIKAKSALAHALQSSRHDCDLLREQYEEEQEAKAE
在此之外,我想创建一个新文件“newmyosin.fasta”,它将在此样本的标题中的括号内提取有机体名称(例如[Homo sapiens]
。Perl代码用于从上面带有多个样本的myosin.fasta文件中读入,在括号[]
中选择名称,然后写出新文件(例如newmyosin.fasta)。
谢谢!
答案 0 :(得分:2)
执行此操作时:
$file = $1;
您覆盖了文件句柄。那你就再也看不懂了。你会得到提到的错误。
你当然应该在其他地方保存比赛,例如:
my $match = $1;
也可能打印出来:
print $newfile $match;
答案 1 :(得分:0)
正如我在comment中所说,您在阅读文件的过程中重新将文件句柄分配给捕获组。由于您为输出打开了一个单独的文件,我假设您要将匹配的字符串打印到该文件中。
话虽如此,您的要求非常模糊,您的样本输入看起来并不准确,并且您没有提供任何样本输出,但如果我理解您的意图,我认为这就是您的意思想:
my $file = 'myosin.fasta';
my $tmp = "$file.tmp";
open(my $new, '>', $tmp) or die "Can't open $tmp: $!";
open(my $old, '<', $file) or die "Can't open $file: $!";
while (<$old>) {
if (/\[([^]]+)\]/) {
print $new "$1\n";
}
}
close($old);
close($new);
rename($file, "$file.bak");
rename($tmp, $file);
运行脚本后myosin.fasta的内容:
Homo sapiens
Homo sapiens
Homo sapiens