我有一个这种格式的PDF嵌套哈希:
[ { :page => 1,
:lines => [
{ :y => 774.0,
:text_groups => [ { :x => 18.0, :width => 421.59599999999995, :text => "XXXX" } ]
},
# ...
]
},
{ :page => 2,
:lines => [
{ :y => 774.0,
:text_groups => [ { :x => 18.0, :width => 421.59599999999995, :text => "XXXX" } ]
},
# ...
],
# ...
}
]
我希望从所有4个页面中获取给定:x
的{{1}}和:y
。
我试过了:
:text
这给了我:
require 'hashie'
coordinates.extend(Hashie::Extensions::DeepLocate)
@hash_array = Hash.new
@hash_array = coordinates.deep_locate -> (key, value, object) { key == :text && value == "XXXX" }
但我需要[ { :x => 18.0, :width => 421.59599999999995, :text => "XXXX" } },
{ :x => 18.0, :width => 421.59599999999995, :text => "XXXX" },
{ :x => 18.0, :width => 421.59599999999995, :text => "XXXX" },
{ :x => 18.0, :width => 421.59599999999995, :text => "XXXX" } ]
和:x
显示如下:
:y
我将使用这些值进行进一步验证。
答案 0 :(得分:0)
我不知道你是否会接受一个不使用Hashie的解决方案,但这就是我的方法:
data = [
{ :page => 1,
:lines => [
{ :y => 774.0,
:text_groups => [ { :x => 18.0, :width => 421.59599999999995, :text => "XXXX" } ]
},
# ...
]
},
{ :page => 2,
:lines => [
{ :y => 774.0,
:text_groups => [ { :x => 18.0, :width => 421.59599999999995, :text => "XXXX" } ]
},
# ...
],
# ...
}
]
SEARCH_TEXT = "XXXX"
coords = data.each_with_object([]) do |page, res|
page[:lines].each do |line|
line[:text_groups].each do |group|
next unless group[:text] == SEARCH_TEXT
res << { x: group[:x], y: line[:y] }
end
end
end
p coords
# => [ { :x => 18.0, :y => 774.0 },
# { :x => 18.0, :y => 774.0 } ]