Question

目前我的正则表达式如下：

(?<country>United States): (?<dial_number>\+1([ ]*[()\d\-\.]+)+)|(?<country>Australia): (?<dial_number>\+61([ ]*[()\d\-\.]+)+)|(?<country>Canada): (?<dial_number>\+1([ ]*[()\d\-\.]+)+)|(?<country>United Kingdom): (?<dial_number>\+44([ ]*[()\d\-\.]+)+)|(?<country>New Zealand): (?<dial_number>\+64([ ]*[()\d\-\.]+)+)

一个看起来像这样的字符串（假数字）：

Test Meeting 
Mon, Jan 15, 2018 10:00 AM - 5:00 PM AEST 

Please join my meeting from your computer, tablet or smartphone. 
https://example.com/join/50263834 

You can also dial in using your phone. 
Australia: +61 2 9037 3201 

Access Code: 204-761-833 

More phone numbers 
United States: +1 (571) 417-3429 
Austria: +43 7 1081 5425 
Belgium: +32 28 92 6018 
Canada: +1 (647) 467-9333 
Denmark: +45 32 72 01 62 
Finland: +358 523 16 0568 
France: +33 159 950 514 
Germany: +49 692 5536 7287 
Ireland: +353 12 360 548 
Italy: +39 0 237 92 48 01 
Netherlands: +31 107 841 377 
New Zealand: +64 9 260 6012 
Norway: +47 21 09 36 51 
Spain: +34 972 75 2103 
Sweden: +46 253 098 826 
Switzerland: +41 225 3290 67
United Kingdom: +44 17 3515 4021 

First Meeting? Let's do a quick system check: https://example.com/system-check

我想按照编写顺序匹配正则表达式。如果澳大利亚队第一次回归比赛，那就意味着，如果美国队第一，那就回归比赛。

目前，在字符串中首先显示的是匹配的内容。在上面的例子中将是澳大利亚。

有没有办法可以在正则表达式的优先级列表中返回最早的匹配？

Answer 1

正则表达式不适合这种排序。我深信您应该以任何顺序匹配所有值，然后根据参考数组的顺序对结果进行排序。

这是一个小例子：

matches = {"Australia"=>"+61 2 9037 3201",
           "United States"=>"+1 (571) 417-3429",
           "Canada"=>"+1 (647) 467-9333",
           "New Zealand"=>"+64 9 260 6012",
           "United Kingdom"=>"+44 17 3515 4021"}

order = ["United States",
         "Australia",
         "Canada",
         "United Kingdom",
         "New Zealand"]

puts matches.sort_by { |element| order.index(element.first) }

Answer 2

我们给出以下字符串。

str=<<BITTER_END
Test Meeting 
Mon, Jan 15, 2018 10:00 AM - 5:00 PM AEST 

Please join my meeting from your computer, tablet or smartphone. 
https://example.com/join/50263834 

You can also dial in using your phone. 
Australia: +61 2 9037 3201 

Access Code: 204-761-833 

More phone numbers 
United States: +1 (571) 417-3429 
Austria: +43 7 1081 5425 
Belgium: +32 28 92 6018 
Canada: +1 (647) 467-9333 
Denmark: +45 32 72 01 62 
Finland: +358 523 16 0568 
France: +33 159 950 514 
Germany: +49 692 5536 7287 
Ireland: +353 12 360 548 
Italy: +39 0 237 92 48 01 
Netherlands: +31 107 841 377 
New Zealand: +64 9 260 6012 
Norway: +47 21 09 36 51 
Spain: +34 972 75 2103 
Sweden: +46 253 098 826 
Switzerland: +41 225 3290 67
United Kingdom: +44 17 3515 4021

First Meeting? Let's do a quick system check: https://example.com/system-check
BITTER_END

我倾向于首先从这个字符串创建一个哈希，其字符串是国家名称，其值是电话号码。

r = /
    ^                     # match start of line
    (?<country>[\p{L} ]+) # match >= 1 letters and spaces in named group country
    :[ ]+                 # match a colon and >= 1 spaces
    (?<dial_number>       # begin a named group dial_mumber
      \+                  # match a literal +
      (?:                 # begin a non-capture group
        # US and Canada
        1[ ]+             # match 1 followed by >= 1 spaces
        \(\d{3}\)         # match a left paren, 3 digits, a right paren
        [ ]+              # match >= 1 spaces
        \d{3}\-\d{4}      # match 3 digits, a dash and 4 digits
        |                 # or
        # rest of world
        \d{2,3}           # match 2 or 3 digits
        (?:               # begin a non-capture group
          [ ]+            # match >=1 spaces
          \d{1,4}         # match 1 to 4 digits
        ){3,5}            # close non-capture group and perform 3-5 times
      )                   # close non-capture group
    )                     # close named group dial_number
    /x                    # free-spacing regex definition mode

h = str.each_line.with_object({}) do |line, h|
  m = line.match r
  h[m[:country]] = m[:dial_number] unless m.nil?
end
  #=> {"Australia"=>"+61 2 9037 3201", "United States"=>"+1 (571) 417-3429",
  #    "Austria"=>"+43 7 1081 5425", "Belgium"=>"+32 28 92 6018",
  #    ...
  #    "Switzerland"=>"+41 225 3290 67", "United Kingdom"=>"+44 17 3515 4021"}

然后我们可以通常的方式检索电话号码。

h["United States"]
  #=> "+1 (571) 417-3429"

h["Shangri-La"]
  #=> nil

如果您拥有国家/地区的优先级列表，并希望找到h中的第一个密钥，并检索其电话号码，请执行以下操作。

priority = ["Fiji", "Shangri-La", "United States", "Finland"]

country = priority.find { |country| h.key?(country) }
  #=> "United States"
country ? [country, h[country]] : nil
  #=> ["United States", "+1 (571) 417-3429"]

Ruby：将某些正则表达式匹配优先于其他正则表达式

2 个答案: