我正在尝试将文件中的某些数据解析并将其存储到哈希图中,而不是使用正则表达式,而是使用字符串比较,但是我遇到了一些我试图修复但未能解决问题的错误。
文件的结构类似于:
"key" + "double colon" + "value"
每行。沿文件重复此结构,每个数据都有一个ID密钥,几乎所有内容都至少有一个“ is_a”密钥,还可能有“ is_obsolete”和“ replaced_by”密钥。
我正在尝试这样解析它:
def get_hpo_data(hpofile="hp.obo")
hpo_data = Hash.new() #Hash map where i want to store all IDs
File.readlines(hpofile).each do |line|
if line.start_with? "id:" #if line is an ID
hpo_id = line[4..13] #Store ID value
hpo_data[hpo_id] = Hash.new() #Setting up hash map for that ID
hpo_data[hpo_id]["parents"] = Array.new()
elsif line.start_with? "is_obsolete:" #If the ID is obsolete
hpo_data[hpo_id]["is_obsolete"] = true #store value in the hash
elsif line.start_with? "replaced_by:" #If the ID is obsolete
hpo_data[hpo_id]["replaced_by"] = line[13..22]
#Store the ID term it was replaced by
elsif line.start_with? "is_a:" #If the ID has a parent ID
hpo_data[hpo_id]["parents"].push(line[6..15])
#Store the parent(s) in the array initialized before
end
end
return hpo_data
end
我期望创建的结构是一个全局哈希,其中每个ID也是一个具有不同数据的哈希(一个字符串数据,一个布尔值和一个长度可变的数组,具体取决于该ID的ID父对象的数量)学期,但出现以下错误:
table_combination.rb:224:in `block in get_hpo_data': undefined method `[]=' for nil:NilClass (NoMethodError)
这次,错误指向replaced_by
elsif
语句,但是我也通过其他任何elsif
语句获得此错误,因此该代码无法解析“ is_obsolete”, “ replaced_by”和“ is_a”属性。如果我尝试删除这些语句,则代码会成功创建全局哈希,并将每个ID项作为哈希。
我还尝试为每个哈希值提供默认值,但这不能解决问题。我什至收到一个以前从未见过的新错误:
table_combination.rb:233:in '[]': no implicit conversion of String into Integer (TypeError)
在此行:
hpo_data[hpo_id]["parents"].push(line[6..15])
这是一个示例,显示两个术语的文件外观,显示了我要处理的不同键:
[Term]
id: HP:0002578
name: Gastroparesis
def: "Decreased strength of the muscle layer of stomach, which leads to a decreased ability to empty the contents of the stomach despite the absence of obstruction." [HPO:probinson]
subset: hposlim_core
synonym: "Delayed gastric emptying" EXACT layperson [ORCID:0000-0001-5208-3432]
xref: MSH:D018589
xref: SNOMEDCT_US:196753007
xref: SNOMEDCT_US:235675006
xref: UMLS:C0152020
is_a: HP:0002577 ! Abnormality of the stomach
is_a: HP:0011804 ! Abnormal muscle physiology
[Term]
id: HP:0002564
name: obsolete Malformation of the heart and great vessels
is_obsolete: true
replaced_by: HP:0030680
答案 0 :(得分:0)
您的代码中可能隐藏了更多错误,但是一个问题确实是您的hpo_data
没有默认值。
如果hpo_data[hpo_id]["replaced_by"] = line[13..22]
尚未初始化,则调用hpo_id
失败。
您可以这样定义hpo_data
:
hpo_data = Hash.new { |hash, key| hash[key] = {'parents' => [] } }
然后删除
hpo_data = Hash.new() #Hash map where i want to store all IDs
和
hpo_data[hpo_id] = Hash.new() #Setting up hash map for that ID
hpo_data[hpo_id]["parents"] = Array.new()
每次调用hpo_data[hpo_id]
时,它将自动定义为{"parents"=>[]}
。
例如:
hpo_data = Hash.new { |hash, key| hash[key] = {'parents' => [] } }
# => {}
hpo_data[1234]
# => {"parents"=>[]}
hpo_data[1234]["parents"] << 6
# => [6]
hpo_data
# => {1234=>{"parents"=>[6]}}
hpo_data[42]["is_obsolete"] = true
# => true
hpo_data
# => {1234=>{"parents"=>[6]}, 42=>{"parents"=>[], "is_obsolete"=>true}}