更新(2):
更改了代码以丢弃标题中的注释,但仍然在哈希键/值赋值中遇到语法:
./convertDataToGeneSymbol.pl第99行的语法错误,靠近“$ geneSymbolToGo {” 语法错误位于./convertDataToGeneSymbol.pl第101行,靠近“}”
我似乎无法在代码中发现任何错误,所以我认为数组无法读取$ go的值?
这是输入文件3的标题:
!10-20行评论
UniProtK / t BA0A021WW37 / t CG17167 / t GO:0016021 / t GO_REF:0000038
(仍然学习如何在这个网站上格式化; / t意味着标签分离)
P.S。对此评论感到抱歉。我的教授需要对我们的课程进行广泛的评论。严格一直给我一些关于这个程序的问题(主要是由于我的经验不足),但当我删除它时,我得到了我想要的结果。到目前为止,感谢您提供所有帮助!
#!/usr/bin/perl
use warnings;
use diagnostics;
# Title: convertDataToGeneSymbol.pl
# Author: Nicholas Bense
# Date: 11/4/15
# Open a filehandle to read file #1
open(INF1,"<",'/scratch/Drosophila/fb_synonym_fb_2014_05.tsv' ) or die $!;
# Open a filehandle to read file #2
open(INF2,"<",'/scratch/Drosophila/FlyRNAi_data_baseline_vs_EGF.txt') or die $!;
# Open a filehandle to read file #3
open(INF3,"<",'/scratch/Drosophila/gene_association.goa_fly') or die $!;
# Open a filehandle to write new file
open(OUTF1,">",'FlyRNAi_data_baseline_vs_EGFSymbol.txt') or die $!;
# Open a filehandle to write new file
open(OUTF2,">",'FlyRNAi_data_baseline_vs_EGF_GO.txt') or die $!;
# Initialize a hash for the gene symbol conversion
my %geneSymbolConversion;
# Read input file 1 line by line
while (<INF1>){
# Get rid of whitespace
chomp;
# Split the line
my @inf1Array = split("\t", $_);
# Filter entries starting with FBgn
if ($inf1Array[0] =~ /(^FBgn\d+)/){
# Assign column 1 to hash key scalar
my $geneID = $inf1Array[0];
# Assign column 2 to hash value scalar
my $geneSymbol = $inf1Array[1];
# Assign key and value to hash
$geneSymbolConversion{$geneID} = $geneSymbol;
}
}
# Discard first line of input file 2
<INF2>;
# Read input file 2 line by line
while (<INF2>){
# Get rid of whitespace
chomp;
# Split the line on tabs
my ($geneID, $egf_Baseline, $egf_Stimulus) = split("\t", $_);
# Check if the codon is present in the hash
if (defined $geneSymbolConversion{$geneID}){
# Get the value associated with the codon from the hash
$geneSymbol = $geneSymbolConversion{$geneID};
}
# Join data and print to output file
print OUTF1 join( "\t", $geneSymbol, $egf_Baseline, $egf_Stimulus), "\n";
}
# Initialize hash for GO conversion
my %geneSymbolToGo;
<INF3>;
# Read input file 3 line by line
while (<INF3>){
# Get rid of whitespace
chomp;
# Discard comment lines
if ($_ !~ /!/){
# Split the line on tabs
my @inf3Array = split("\t", $_);
# Assign column 3 to hash key scalar
my $geneSymbol = $inf3Array[2];
# Assign column 4 to hash value scalar
my $go = $inf3Array[3];
# Assign key and value to hash
my $geneSymbolToGo{$geneSymbol} = $go;
}
}
# Open a filehandle to read file #3
open(INF4,"<",'FLYRNAi_data_baseline_vs_EGFSymbol.txt') or die $!;
# Read input file 4 line by line
while (<INF4>){
# Remove end of line characters
chomp;
# Split the line on tabs
my ($geneSymbol, $egf_Baseline, $egf_Stimulus), "\n";
# Check if the gene symbol is present in the hash
if (defined $geneSymbolToGo{$geneSymbol}){
# Get the value associated with the codon from the hash
$go = $geneSymbolToGo{$geneSymbol};
}
# Join data and print to output file
print OUTF2 join( "\t", $go, $egf_Baseline, $egf_Stimulus), "\n";
}
答案 0 :(得分:1)
始终
use strict;
use warnings 'all';
在每个 Perl程序的开头。除非您无法理解这两个错误消息,否则use diagnostics
不太有用
如果要执行许多磁盘操作,那么use autodie
有助于避免在每次操作后编写合理的代码来捕获任何错误,例如or die $!
始终使用词法文件句柄。例如
open my $inf1_fh, '<', '/scratch/Drosophila/fb_synonym_fb_2014_05.tsv'
并更好地命名。您的代码有两个极端,对基本数据使用过于冗长的geneSymbolConversion
,但文件句柄使用INF1
,INF2
等。我不了解您的申请,但我确定不会想到反映该文件目的的内容并添加_fh
来表示它&# 39; sa文件句柄
如果您使用以本地变量的大写字母开头的标识符,则可能会出现问题。熟悉Perl的人也会感谢你在名称中避免使用大写字母 ,并使用 snake case ,因此%geneSymbolConversion
更好地写成{{ 1}}
您的标识符也太长了。我们可以将此哈希的名称进一步缩写为%gene_symbol_conversion
而不含歧义
%conversion
的第一个参数是正则表达式,第二个参数的默认值是split
,所以最好写
$_
作为
split("\t", $_)
您的正则表达式split /\t/
会捕获匹配的字符串,但从不使用捕获,因此您应该只编写/(^FBgn\d+)/
我不明白你在/^FBgn\d+/
循环中做了什么
while
因为while ( $INF1Array[0] =~ /(^FBgn\d+)/ ) { ... }
(应该是$INF1Array[0]
)永远不会在循环体中更改,所以它永远不会终止。我的猜测是$inf1_array[0]
应该是while
使用Perl的定义或运算符。而不是
if
你应该
my $geneSymbol = "NA";
if ( defined $geneSymbolConversion{$geneID} ) {
$geneSymbol = $geneSymbolConversion{$geneID};
}
这是我写更多Perlish和可用的东西。它远不是一个复杂的程序,所以我认为它根本不需要任何评论。他们所采用的垂直空间比他们在解释中所弥补的更明显是一个障碍
my $gene_symbol = $conversion{$gene_id} // 'NA'