Question

我想将正式列表从https://www.loc.gov/marc/bibliographic/ecbdlist.html解析为哈希和数组的嵌套结构。

起初，我使用了递归方法 - 但遇到了Ruby（和BTW也是Python）只能处理少于1000个递归调用（堆栈级别太深）的问题。

我找到了“select_before”，看起来很棒：

require 'pp'    
# read list into array and get rid of unnecessary lines
marc = File.readlines('marc21.txt', 'r:utf-8')[0].lines.map(&:chomp).select { |line|  line if !line.match(/^\s*$/) && !line.match(/^--.+/) }
# magic starts here
marc = marc.slice_before { |line| line[/^ */].size == 0  }.to_a 
marc =  marc.inject({}) { |hash, arr| hash = hash.merge( arr[0] => arr[1..-1] ) }

我现在想在整个数组中迭代这些步骤。由于列表中的缩进级别不同（[0,2,3,4,5,6,8,9,10,12]并非所有这些级别始终存在），我使用辅助方法get_indentation_map仅使用最小量每次迭代中的缩进。

但是只添加一个级别（远离将整个数组转换为新结构的目标），我得到错误“没有将Regex隐式转换为整数”，其原因是我没有看到：

def get_indentation_map( arr )
  arr.map { |line| line[/^ */].size }
end
# starting again after slice_before of the unindented lines (== 0)
marc =  marc.inject({}) do |hash, arr| 
  hash = hash.merge( arr[0] => arr[1..-1] ) # so far like above
  # now trying to do the same on the next level
  hash = hash.inject({}) do |h, a|
    indentation_map = get_indentation_map( a ).uniq.sort
    # only slice before smallest indentation
    a = a.slice_before { |line| line[/^ */].size == indentation_map[0] }.to_a 
    h = h.merge( a[0] => a[1..-1] )     
  end
  hash
end

我将非常感谢提示如何最好地解析此列表。我的目标是一个类似json的结构，其中每个条目都是进一步缩进行的关键（如果有的话）。提前致谢。

使用“select_before”

0 个答案: