我有一个数据转储,其中以下是一行:
{,lat:26.3832456,distance:678.4075116373302,lon:120.4731951,address:tourism:viewpoint,},{,lat:26.3830149,distance:622.2862561842148,lon:120.473753,address:name:xe7,xbe,x85,xe6,xbc,xa2,xe5,x9d,xaa,tourism:viewpoint,},{,lat:26.3833609,distance:363.7364243757184,lon:120.4763708,address:name:xe5,x9c,x8b,xe4,xb9,x8b,xe5,x8c,x97,xe7,x96,x86,tourism:viewpoint,},{,lat:26.3823648,distance:223.60523114628876,lon:120.4821298,address:name:xe5,x90,x8e,xe6,xbe,xb3,natural:bay,},{,lat:26.3788243,distance:470.02293394005875,lon:120.480733,address:name:xe5,x90,x8e,xe6,xbe,xb3,xe5,xb1,xb1,source:GNS,natural:peak,},{,lat:26.3750042,distance:893.4290785528082,lon:120.4808826,address:name:xe8,x93,xae,xe8,x8a,xb1,xe5,x9c,x92,source:GNS,natural:peak,},{,lat:26.3763331,distance:742.92090763674,lon:120.4795115,address:name:xe8,xa5,xbf,xe5,xbc,x95,xe5,xb3,xb6,place:hamlet,source:GNS,},{,lat:26.378645,distance:623.327734488774,lon:120.4839399,address:source:PGS,natural:coastline,},{,lat:26.3801244,distance:418.6308872217763,lon:120.4772875,address:highway:residential,},{,lat:26.3791422,distance:434.6736862343828,lon:120.4792953,address:highway:residential,},{,lat:26.3779802,distance:739.2129423740619,lon:120.4751349,address:highway:unclassified,},{,lat:26.3770924,distance:675.0424314750977,lon:120.4815607,address:highway:residential,},{,lat:26.3760869,distance:798.0261247167285,lon:120.4821517,address:highway:path,},{,lat:26.3766434,distance:737.1372670528466,lon:120.4821003,address:highway:path,},{,lat:26.3813278,distance:384.84440601318613,lon:120.4766175,address:highway:path,},{,lat:26.3755092,distance:833.3985359252805,lon:120.4802778,address:highway:road,},{,lat:26.3785345,distance:496.6253230490143,lon:120.4799081,address:highway:road,}
每对括号内的部分(即“{...}”)表示有关一个身份的信息。我需要比较每对大括号的distance
字段,然后以最小距离显示大括号的内容。例如,在上面一行的示例中,我想输出以下内容:
{,lat:26.3823648,distance:223.60523114628876,lon:120.4821298,address:name:xe5,x90,x8e,xe6,xbe,xb3,natural:bay,}
因为这是distance
字段值最小的那个。
怎么做?我编写了以下代码,只提取所有距离来比较它们,但即使这样也行不通:
require 'rubygems'
require 'mechanize'
require 'csv'
CSV.open('Output.csv', "wb") do |csv|
CSV.foreach('Original.csv', :headers=>true) do |row|
vector = row.split(",")
dist = vector.match("^.*\/distance:\/(.*)\/")
csv << dist
end
end
我的想法是提取所有距离,比较它们,找到最小的距离,回到原始字符串以找到具有该特定距离的大括号,然后输出这些大括号中的内容。但这似乎是一种令人费解的方式。是否有更优雅的方式输出最小距离的支架?感谢。
答案 0 :(得分:2)
不是很优雅,但似乎有效:
s.scan(/\{[^{}]*\}/).min_by { |r| r =~ /distance:(.*),/; $1.to_f }
其中s
将是您的初始数据转储字符串。
scan
将初始数据拆分为一个记录数组(不是大括号的大括号之间的任何内容都被视为记录的一部分)。 min_by遍历该数组,查找记录,该记录具有作为参数传递的块给出的最小值 - 在这种情况下,块只是一个正则表达式匹配,用于查找记录中的距离值。
答案 1 :(得分:1)
让str
成为保存给定字符串的变量。
第一步是将字符串拆分为逗号,前面是右括号,后跟左括号:
r0 = /
(?<=}) # match a right brace in a positive lookbehind
, # match a comma
(?={) # match a right brace in a positive lookahead
/x # free-spacing regex definition mode
arr = str.split(r0)
#=> ["{,lat:26.3832456,distance:678.4075116373302,lon:120.4731951,...}",
# "{,lat:26.3830149,distance:622.2862561842148,lon:120.473753,...}",
# ...
# "{,lat:26.3750042,distance:893.4290785528082,lon:120.4808826,...}",
# ...
# "{,lat:26.3785345,distance:496.6253230490143,lon:120.4799081,}"]
str.split(r0).size
#=> 17
然后我们将max_by
应用于该数组,其中max_by
的块返回每个字符串的距离,表示为浮点数。
r1 = /
(?<=,distance:) # match ",distance:" in a positive lookbehind
\d+ # match one or more digits
\. # match a decimal point
\d+ # match one or more digits
/x # free-spacing regex definition mode
arr.max_by { |s| s[r1].to_f }
#=> "{,lat:26.3750042,distance:893.4290785528082,lon:120.4808826,...}"
我假设数组中的每个字符串都包含一个距离字段。如果某些字符串可能不会,则上述表达式将转换为:
arr.max_by { |s| (s[r1] || -Float::INFINITY).to_f }
还需要检查返回的字符串是否包含距离字段。
我们可以把它放在一个表达式中。
str.split(/(?<=}),(?={)/).
max_by { |s| (s[/(?<=,distance:)\d+\.\d+/] || -Float::INFINITY).to_f }