Question

更新（2）：

更改了代码以丢弃标题中的注释，但仍然在哈希键/值赋值中遇到语法：

./convertDataToGeneSymbol.pl第99行的

语法错误，靠近“$ geneSymbolToGo {” 语法错误位于./convertDataToGeneSymbol.pl第101行，靠近“}”

我似乎无法在代码中发现任何错误，所以我认为数组无法读取$ go的值？

这是输入文件3的标题：

！10-20行评论

UniProtK / t BA0A021WW37 / t CG17167 / t GO：0016021 / t GO_REF：0000038
（仍然学习如何在这个网站上格式化; / t意味着标签分离）

P.S。对此评论感到抱歉。我的教授需要对我们的课程进行广泛的评论。严格一直给我一些关于这个程序的问题（主要是由于我的经验不足），但当我删除它时，我得到了我想要的结果。到目前为止，感谢您提供所有帮助！

#!/usr/bin/perl
use warnings;
use diagnostics;

# Title: convertDataToGeneSymbol.pl
# Author: Nicholas Bense
# Date: 11/4/15

# Open a filehandle to read file #1
open(INF1,"<",'/scratch/Drosophila/fb_synonym_fb_2014_05.tsv' ) or die $!;

# Open a filehandle to read file #2
open(INF2,"<",'/scratch/Drosophila/FlyRNAi_data_baseline_vs_EGF.txt') or die $!;

# Open a filehandle to read file #3
open(INF3,"<",'/scratch/Drosophila/gene_association.goa_fly') or die $!;

# Open a filehandle to write new file
open(OUTF1,">",'FlyRNAi_data_baseline_vs_EGFSymbol.txt') or die $!;

# Open a filehandle to write new file
open(OUTF2,">",'FlyRNAi_data_baseline_vs_EGF_GO.txt') or die $!;

# Initialize a hash for the gene symbol conversion
my %geneSymbolConversion;

# Read input file 1 line by line
while (<INF1>){

# Get rid of whitespace
        chomp;

# Split the line
        my @inf1Array = split("\t", $_);

# Filter entries starting with FBgn
        if ($inf1Array[0] =~ /(^FBgn\d+)/){

# Assign column 1 to hash key scalar
        my $geneID = $inf1Array[0];

# Assign column 2 to hash value scalar
        my $geneSymbol = $inf1Array[1];

# Assign key and value to hash
        $geneSymbolConversion{$geneID} = $geneSymbol;

}

}

# Discard first line of input file 2
<INF2>;

# Read input file 2 line by line
while (<INF2>){


        # Get rid of whitespace
        chomp;

        # Split the line on tabs
        my ($geneID, $egf_Baseline, $egf_Stimulus) = split("\t", $_);

        # Check if the codon is present in the hash
        if (defined $geneSymbolConversion{$geneID}){

                # Get the value associated with the codon from the hash
                $geneSymbol = $geneSymbolConversion{$geneID};
        }

        # Join data and print to output file
        print OUTF1 join( "\t", $geneSymbol, $egf_Baseline, $egf_Stimulus), "\n";
}

# Initialize hash for GO conversion
my %geneSymbolToGo;

<INF3>;

# Read input file 3 line by line
while (<INF3>){

        # Get rid of whitespace
        chomp;

        # Discard comment lines
        if ($_ !~ /!/){

        # Split the line on tabs
        my @inf3Array = split("\t", $_);

        # Assign column 3 to hash key scalar
        my $geneSymbol = $inf3Array[2];

        # Assign column 4 to hash value scalar
        my $go = $inf3Array[3];

        # Assign key and value to hash
        my $geneSymbolToGo{$geneSymbol} = $go;
        }
}

# Open a filehandle to read file #3
open(INF4,"<",'FLYRNAi_data_baseline_vs_EGFSymbol.txt') or die $!;

# Read input file 4 line by line
while (<INF4>){

        # Remove end of line characters
        chomp;

        # Split the line on tabs
        my ($geneSymbol, $egf_Baseline, $egf_Stimulus), "\n";

        # Check if the gene symbol is present in the hash
        if (defined $geneSymbolToGo{$geneSymbol}){

                # Get the value associated with the codon from the hash
                $go = $geneSymbolToGo{$geneSymbol};

        }

        # Join data and print to output file
        print OUTF2 join( "\t", $go, $egf_Baseline, $egf_Stimulus), "\n";
}

Answer 1

始终
```
use strict;
use warnings 'all';
```
在每个 Perl程序的开头。除非您无法理解这两个错误消息，否则use diagnostics不太有用
如果要执行许多磁盘操作，那么use autodie有助于避免在每次操作后编写合理的代码来捕获任何错误，例如or die $!
始终使用词法文件句柄。例如
```
open my $inf1_fh, '<', '/scratch/Drosophila/fb_synonym_fb_2014_05.tsv'
```
并更好地命名。您的代码有两个极端，对基本数据使用过于冗长的geneSymbolConversion，但文件句柄使用INF1，INF2等。我不了解您的申请，但我确定不会想到反映该文件目的的内容并添加_fh来表示它＆＃ 39; sa文件句柄
如果您使用以本地变量的大写字母开头的标识符，则可能会出现问题。熟悉Perl的人也会感谢你在名称中避免使用大写字母，并使用 snake case ，因此%geneSymbolConversion更好地写成{{ 1}}

您的标识符也太长了。我们可以将此哈希的名称进一步缩写为%gene_symbol_conversion而不含歧义
%conversion的第一个参数是正则表达式，第二个参数的默认值是split，所以最好写
```
$_
```
作为
```
split("\t", $_)
```
您的正则表达式split /\t/会捕获匹配的字符串，但从不使用捕获，因此您应该只编写/(^FBgn\d+)/
我不明白你在/^FBgn\d+/循环中做了什么
```
while
```
因为while ( $INF1Array[0] =~ /(^FBgn\d+)/ ) { ... }（应该是$INF1Array[0]）永远不会在循环体中更改，所以它永远不会终止。我的猜测是$inf1_array[0]应该是while

使用Perl的定义或运算符。而不是

if

你应该

my $geneSymbol = "NA";

if ( defined $geneSymbolConversion{$geneID} ) {
    $geneSymbol = $geneSymbolConversion{$geneID};
}

这是我写更多Perlish和可用的东西。它远不是一个复杂的程序，所以我认为它根本不需要任何评论。他们所采用的垂直空间比他们在解释中所弥补的更明显是一个障碍

my $gene_symbol = $conversion{$gene_id} // 'NA'

Perl未初始化的值哈希查找基因符号

1 个答案: