我有以下文本文件:
CUI|SDUI|HpoStr|MedGenStr|MedGenStr_SAB|STY|
CN000002|HP:0000001|All|All|HPO|Finding|
CN000003|HP:0000002|Abnormality of body height|Abnormality of body height|GTR|Finding|
CN000004|HP:0000003|Multicystic kidney dysplasia|Multicystic kidney dysplasia|GTR|Finding|
CN000006|HP:0000005|Mode of inheritance|Mode of inheritance|HPO|Finding|
C0443147|HP:0000006|Autosomal dominant inheritance|Autosomal dominant inheritance|GTR|Intellectual Product|
C0441748|HP:0000007|Autosomal recessive inheritance|Autosomal recessive inheritance|HPO|Intellectual Product|
CN000009|HP:0000008|Abnormality of female internal genitalia|Abnormality of female internal genitalia|GTR|Finding|
我想用Perl解析它。这是我到目前为止所得到的:
#!/usr/bin/perl
open (FILE, 'filename.txt');
while (<FILE>) {
chomp;
($CUI, $SDUI, $HpoStr, $MedGenStr, $MedGenStr_SAB, $STY) = split("\t");
print "CUI: $CUI\n";
print "SDUI: $SDUI\n";
print "HpoStr: $HpoStr\n";
print "MedGenStr: $MedGenStr\n";
print "MedGenStr_SAB: $MedGenStr_SAB\n";
print "STY: $STY\n";
print "---------\n";
}
close (FILE);
exit;
当我使用nano编辑器运行它时,我确实得到了输出,但是当我使用像perl filename.pl
这样的命令时,我有很多错误。我想知道我的代码是错误的还是有更好的方法来构建我的代码。
-1 down vote accept
上面代码中的情况我将输入作为单独的.txt文件 #
CUI | SDUI | HpoStr | MedGenStr | MedGenStr_SAB | STY |
CN000002 | HP:0000001 |所有|所有| HPO |查找| CN000003 | HP:0000002 |身高异常|身高异常| GTR | Fi nding | CN000004 | HP:0000003 |多囊肾发育不良|多囊肾发育不良| GT R |查找| CN000006 | HP:0000005 |继承模式|继承模式| HPO |查找| C0443147 | HP:0000006 |常染色体显性遗传|常染色体显性遗传| GTR |知识产品|家具装修,必找华美! C0441748 | HP:0000007 |常染色体隐性遗传|常染色体隐性遗传| HPO |智力产品|亚德诺半导体CN000009 | HP:0000008 |女性内生殖器异常|女性内生殖器异常| GTR |发现| #
如果我想用作文件输入选项我该如何去做?因为文件的大小就大到1GB。
这些是我必须将条目与这些标题相关联的头文件
答案 0 :(得分:1)
您的列由管道(cbind(Anew[,a], Anew[,c],...,Anew[,h])
)分隔,而不是制表符,因此您需要拆分:
|
输出:
use strict;
use warnings;
use Data::Dump;
while (<DATA>) {
chomp;
my @fields = split(/\|/, $_);
dd(\@fields);
}
__DATA__
CUI|SDUI|HpoStr|MedGenStr|MedGenStr_SAB|STY|
CN000002|HP:0000001|All|All|HPO|Finding|
CN000003|HP:0000002|Abnormality of body height|Abnormality of body height|GTR|Finding|
CN000004|HP:0000003|Multicystic kidney dysplasia|Multicystic kidney dysplasia|GTR|Finding|
CN000006|HP:0000005|Mode of inheritance|Mode of inheritance|HPO|Finding|
C0443147|HP:0000006|Autosomal dominant inheritance|Autosomal dominant inheritance|GTR|Intellectual Product|
C0441748|HP:0000007|Autosomal recessive inheritance|Autosomal recessive inheritance|HPO|Intellectual Product|
CN000009|HP:0000008|Abnormality of female internal genitalia|Abnormality of female internal genitalia|GTR|Finding|
如果您想提供要阅读的文件,只需将["CUI", "SDUI", "HpoStr", "MedGenStr", "MedGenStr_SAB", "STY"]
["CN000002", "HP:0000001", "All", "All", "HPO", "Finding"]
[
"CN000003",
"HP:0000002",
"Abnormality of body height",
"Abnormality of body height",
"GTR",
"Finding",
]
[
"CN000004",
"HP:0000003",
"Multicystic kidney dysplasia",
"Multicystic kidney dysplasia",
"GTR",
"Finding",
]
...
更改为while (<DATA>)
并运行如下脚本:while (<>)
。
如果您需要按名称访问字段,则需要哈希:
perl script.pl input.txt
输出:
my @headers;
while (<DATA>) {
chomp;
my @fields = split(/\|/, $_);
if ($. == 1) {
@headers = @fields;
next;
}
my %data;
@data{@headers} = @fields;
dd(\%data);
}
但是,看起来您很快就会接近使用Text::CSV比尝试手动执行此操作更好的程度。