Question

各种格式的电话号码数据（我之所以选择这些，是因为进入的数据不可靠且不符合预期格式）：

+1 480-874-4666
404-581-4000
(805) 682-4726
978-851-7321, Ext 2606
413- 658-1100
(513) 287-7000,Toll Free (800) 733-2077
1 (813) 274-8130
212-363-3200,Media Relations: 212-668-2251.
323/221-2164

我的Ruby代码提取所有数字，删除美国国家代码的任何前导1，然后使用前10位数字以所需格式创建“新”电话号码：

  nums = phone_number_string.scan(/[0-9]+/)
  if nums.size > 0
    all_nums = nums.join
    all_nums = all_nums[0..0] == "1" ? all_nums[1..-1] : all_nums
    if all_nums.size >= 10
      ten_nums = all_nums[0..9]
      final_phone = "#{ten_nums[0..2]}-#{ten_nums[3..5]}-#{ten_nums[6..9]}"
    else
      final_phone = ""
    end
    puts "#{final_phone}"
  else
    puts "No number to fix."
  end

结果非常好！

480-874-4666
404-581-4000
805-682-4726
978-851-7321
413-658-1100
513-287-7000
813-274-8130
212-363-3200
323-221-2164

但是，我认为有更好的方法。你能重构一下这个更高效，更清晰，更实用吗？

Answer 1

这是一个更简单的方法，只使用正则表达式和替换：

def extract_phone_number(input)
  if input.gsub(/\D/, "").match(/^1?(\d{3})(\d{3})(\d{4})/)
    [$1, $2, $3].join("-")
  end
end

这会删除所有非数字（\D），跳过可选的前导数字^1?），然后以块（(\d{3})(\d{3})(\d{4})）和格式提取剩下的第10个数字。< / p>

以下是测试：

test_data = {
  "+1 480-874-4666"                             => "480-874-4666",
  "404-581-4000"                                => "404-581-4000",
  "(805) 682-4726"                              => "805-682-4726",
  "978-851-7321, Ext 2606"                      => "978-851-7321",
  "413- 658-1100"                               => "413-658-1100",
  "(513) 287-7000,Toll Free (800) 733-2077"     => "513-287-7000",
  "1 (813) 274-8130"                            => "813-274-8130",
  "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200",
  "323/221-2164"                                => "323-221-2164",
  ""                                            => nil,
  "foobar"                                      => nil,
  "1234567"                                     => nil,
}

test_data.each do |input, expected_output|
  extracted = extract_phone_number(input)
  print "FAIL (expected #{expected_output}): " unless extracted == expected_output
  puts extracted
end

Answer 2

我的方法有点不同（我认为更好的恕我直言:-)：我不需要错过任何电话号码，即使有2个在线。我也不想让有3组数字的线路相距很远（参见cookies示例），我不想将IP地址误认为电话号码。

代码允许每行多个数字，但也要求数字组彼此“接近”：

def extract_phone_number(input)
  result = input.scan(/(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/).map{|e| e.join('-')}
  # <result> is an Array of whatever phone numbers were extracted, and the remapping
  # takes care of cleaning up each number in the Array into a format of 800-432-1234
  result = result.join(' :: ')
  # <result> is now a String, with the numbers separated by ' :: '
  # ... or there is another way to do it (see text below the code) that only gets the
  # first phone number only.

  # Details of the Regular Expressions and what they're doing
  # 1. (\d{3}) -- get 3 digits (and keep them)
  # 2. \D{0,3} -- allow skipping of up to 3 non-digits. This handles hyphens, parentheses, periods, etc.
  # 3. (\d{3}) -- get 3 more digits (and keep them)
  # 4. \D{0,3} -- skip up to 0-3 non-digits
  # 5. (\d{4}) -- keep the final 4 digits

  result.empty? ? nil : result
end

以下是测试（还有一些额外的测试）

test_data = {
  "DB=Sequel('postgres://user:username@192.168.1.101/test_test')" => nil, # DON'T MISTAKE IP ADDRESSES AS PHONE NUMBERS
  "100 cookies + 950 cookes = 1050 cookies"     => nil,  # THIS IS NEW
  "this 123 is a 456 bad number 7890"           => nil,  # THIS IS NEW
  "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200 :: 212-668-2251", # THIS IS CHANGED
  "this is +1 480-874-4666"                     => "480-874-4666",
  "something 404-581-4000"                      => "404-581-4000",
  "other (805) 682-4726"                        => "805-682-4726",
  "978-851-7321, Ext 2606"                      => "978-851-7321",
  "413- 658-1100"                               => "413-658-1100",
  "(513) 287-7000,Toll Free (800) 733-2077"     => "513-287-7000 :: 800-733-2077", # THIS IS CHANGED
  "1 (813) 274-8130"                            => "813-274-8130",
  "323/221-2164"                                => "323-221-2164",
  ""                                            => nil,
  "foobar"                                      => nil,
  "1234567"                                     => nil,
}

def test_it(test_data)
  test_data.each do |input, expected_output|
    extracted = extract_phone_number(input)
    puts "#{extracted == expected_output ? 'good': 'BAD!'}   ::#{input} => #{extracted.inspect}"
  end
end

test_it(test_data)

替代实现：通过使用“扫描”，它将自动多次应用正则表达式，如果您希望每行额外多于1个电话号码，这是很好的。如果您只想获得第一个电话号码，那么您也可以使用：

first_phone_number = begin 
  m = /(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/.match(input)
  [m[1],m[2],m[3]].join('-')
rescue nil; end

（使用RegExp的“匹配”功能，只是一种不同的做事方式）

Answer 3

对于北美计划中的数字，可以使用phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)[1]

提取第一个数字

例如：

test_phone_numbers = ["+1 480-874-4666",
                      "404-581-4000",
                      "(805) 682-4726",
                      "978-851-7321, Ext 2606",
                      "413- 658-1100",
                      "(513) 287-7000,Toll Free (800) 733-2077",
                      "1 (813) 274-8130",
                      "212-363-3200,Media Relations: 212-668-2251.",
                      "323/221-2164",
                      "foobar"]

test_phone_numbers.each do | phone_number_string | 
  match = phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)
  puts(
    if (match)
      "#{match[1][0..2]}-#{match[1][3..5]}-#{match[1][6..9]}"
    else
      "No number to fix."
    end
  )
end

与起始代码一样，这不会捕获多个数字，例如“（513）287-7000，免费电话（800）733-2077”

FWIW，从长远来看，我发现捕获和存储完整的数字更容易，包括国家代码和没有分隔符;在捕获期间进行猜测，其中numbering plan缺少前缀的是{并且在渲染时选择格式，例如NANP v.DE。

Answer 4

这是一个旧线程，尽管我想分享一个解决方案。

def extract_phone_number(input)
  input.delete!('^0-9').gsub!(/^1?(\d{3})(\d{3})(\d{4})/, '\1-\2-\3')[0..11]
rescue NoMethodError => e
  nil
end

delete!删除所有非数字字符。

gsub!匹配数字，然后将它们模板化为连字符分隔的字符串。

[0..11]切掉所需的数字（对于扩展名）

急救块可防止在nil上调用修改方法

使用上面发布的测试。

tests = {
  '+1 480-874-4666'                             => '480-874-4666',
  '404-581-4000'                                => '404-581-4000',
  '(805) 682-4726'                              => '805-682-4726',
  '978-851-7321, Ext 2606'                      => '978-851-7321',
  '413- 658-1100'                               => '413-658-1100',
  '(513) 287-7000,Toll Free (800) 733-2077'     => '513-287-7000',
  '1 (813) 274-8130'                            => '813-274-8130',
  '212-363-3200,Media Relations: 212-668-2251.' => '212-363-3200',
  '323/221-2164'                                => '323-221-2164',
  ''                                            => nil,
  'foobar'                                      => nil,
  '1234567'                                     => nil
}

tests.each do |input, expected_output|
  input  = input.dup if input.frozen?
  result = extract_phone_number(input)

  if result == expected_output
    print "PASS: #{result}\n"
  else
    print "FAIL (expected #{expected_output})\n"
  end
end

# Console
=> PASS: 480-874-4666
=> PASS: 404-581-4000
=> PASS: 805-682-4726
=> PASS: 978-851-7321
=> PASS: 413-658-1100
=> PASS: 513-287-7000
=> PASS: 813-274-8130
=> PASS: 212-363-3200
=> PASS: 323-221-2164
=> PASS:
=> PASS:
=> PASS:

提取电话号码和重新格式化的更好方法？

4 个答案: