如何比较字符串中的数值并显示其中一个?

时间:2017-02-04 14:58:07

标签: ruby regex

我有一个数据转储,其中以下是一行:

{,lat:26.3832456,distance:678.4075116373302,lon:120.4731951,address:tourism:viewpoint,},{,lat:26.3830149,distance:622.2862561842148,lon:120.473753,address:name:xe7,xbe,x85,xe6,xbc,xa2,xe5,x9d,xaa,tourism:viewpoint,},{,lat:26.3833609,distance:363.7364243757184,lon:120.4763708,address:name:xe5,x9c,x8b,xe4,xb9,x8b,xe5,x8c,x97,xe7,x96,x86,tourism:viewpoint,},{,lat:26.3823648,distance:223.60523114628876,lon:120.4821298,address:name:xe5,x90,x8e,xe6,xbe,xb3,natural:bay,},{,lat:26.3788243,distance:470.02293394005875,lon:120.480733,address:name:xe5,x90,x8e,xe6,xbe,xb3,xe5,xb1,xb1,source:GNS,natural:peak,},{,lat:26.3750042,distance:893.4290785528082,lon:120.4808826,address:name:xe8,x93,xae,xe8,x8a,xb1,xe5,x9c,x92,source:GNS,natural:peak,},{,lat:26.3763331,distance:742.92090763674,lon:120.4795115,address:name:xe8,xa5,xbf,xe5,xbc,x95,xe5,xb3,xb6,place:hamlet,source:GNS,},{,lat:26.378645,distance:623.327734488774,lon:120.4839399,address:source:PGS,natural:coastline,},{,lat:26.3801244,distance:418.6308872217763,lon:120.4772875,address:highway:residential,},{,lat:26.3791422,distance:434.6736862343828,lon:120.4792953,address:highway:residential,},{,lat:26.3779802,distance:739.2129423740619,lon:120.4751349,address:highway:unclassified,},{,lat:26.3770924,distance:675.0424314750977,lon:120.4815607,address:highway:residential,},{,lat:26.3760869,distance:798.0261247167285,lon:120.4821517,address:highway:path,},{,lat:26.3766434,distance:737.1372670528466,lon:120.4821003,address:highway:path,},{,lat:26.3813278,distance:384.84440601318613,lon:120.4766175,address:highway:path,},{,lat:26.3755092,distance:833.3985359252805,lon:120.4802778,address:highway:road,},{,lat:26.3785345,distance:496.6253230490143,lon:120.4799081,address:highway:road,}

每对括号内的部分(即“{...}”)表示有关一个身份的信息。我需要比较每对大括号的distance字段,然后以最小距离显示大括号的内容。例如,在上面一行的示例中,我想输出以下内容:

{,lat:26.3823648,distance:223.60523114628876,lon:120.4821298,address:name:xe5,x90,x8e,xe6,xbe,xb3,natural:bay,}

因为这是distance字段值最小的那个。

怎么做?我编写了以下代码,只提取所有距离来比较它们,但即使这样也行不通:

require 'rubygems'
require 'mechanize'
require 'csv'    
CSV.open('Output.csv', "wb") do |csv|
    CSV.foreach('Original.csv', :headers=>true) do |row|
        vector = row.split(",")    
        dist = vector.match("^.*\/distance:\/(.*)\/")    
        csv << dist
    end
end

我的想法是提取所有距离,比较它们,找到最小的距离,回到原始字符串以找到具有该特定距离的大括号,然后输出这些大括号中的内容。但这似乎是一种令人费解的方式。是否有更优雅的方式输出最小距离的支架?感谢。

2 个答案:

答案 0 :(得分:2)

不是很优雅,但似乎有效:

s.scan(/\{[^{}]*\}/).min_by { |r| r =~ /distance:(.*),/; $1.to_f }

其中s将是您的初始数据转储字符串。

scan将初始数据拆分为一个记录数组(不是大括号的大括号之间的任何内容都被视为记录的一部分)。 min_by遍历该数组,查找记录,该记录具有作为参数传递的块给出的最小值 - 在这种情况下,块只是一个正则表达式匹配,用于查找记录中的距离值。

答案 1 :(得分:1)

str成为保存给定字符串的变量。

第一步是将字符串拆分为逗号,前面是右括号,后跟左括号:

r0 = /
     (?<=}) # match a right brace in a positive lookbehind
     ,      # match a comma
     (?={)  # match a right brace in a positive lookahead
     /x     # free-spacing regex definition mode

arr = str.split(r0)
  #=> ["{,lat:26.3832456,distance:678.4075116373302,lon:120.4731951,...}",
  #    "{,lat:26.3830149,distance:622.2862561842148,lon:120.473753,...}",
  #    ...
  #    "{,lat:26.3750042,distance:893.4290785528082,lon:120.4808826,...}",
  #    ...
  #    "{,lat:26.3785345,distance:496.6253230490143,lon:120.4799081,}"]

str.split(r0).size
  #=> 17 

然后我们将max_by应用于该数组,其中max_by的块返回每个字符串的距离,表示为浮点数。

r1 = /
     (?<=,distance:) # match ",distance:" in a positive lookbehind
     \d+             # match one or more digits
     \.              # match a decimal point
     \d+             # match one or more digits
     /x     # free-spacing regex definition mode

arr.max_by { |s| s[r1].to_f }
  #=> "{,lat:26.3750042,distance:893.4290785528082,lon:120.4808826,...}" 

我假设数组中的每个字符串都包含一个距离字段。如果某些字符串可能不会,则上述表达式将转换为:

arr.max_by { |s| (s[r1] || -Float::INFINITY).to_f }

还需要检查返回的字符串是否包含距离字段。

我们可以把它放在一个表达式中。

str.split(/(?<=}),(?={)/).
    max_by { |s| (s[/(?<=,distance:)\d+\.\d+/] || -Float::INFINITY).to_f }