各种格式的电话号码数据(我之所以选择这些,是因为进入的数据不可靠且不符合预期格式):
+1 480-874-4666
404-581-4000
(805) 682-4726
978-851-7321, Ext 2606
413- 658-1100
(513) 287-7000,Toll Free (800) 733-2077
1 (813) 274-8130
212-363-3200,Media Relations: 212-668-2251.
323/221-2164
我的Ruby代码提取所有数字,删除美国国家代码的任何前导1,然后使用前10位数字以所需格式创建“新”电话号码:
nums = phone_number_string.scan(/[0-9]+/)
if nums.size > 0
all_nums = nums.join
all_nums = all_nums[0..0] == "1" ? all_nums[1..-1] : all_nums
if all_nums.size >= 10
ten_nums = all_nums[0..9]
final_phone = "#{ten_nums[0..2]}-#{ten_nums[3..5]}-#{ten_nums[6..9]}"
else
final_phone = ""
end
puts "#{final_phone}"
else
puts "No number to fix."
end
结果非常好!
480-874-4666
404-581-4000
805-682-4726
978-851-7321
413-658-1100
513-287-7000
813-274-8130
212-363-3200
323-221-2164
但是,我认为有更好的方法。你能重构一下这个更高效,更清晰,更实用吗?
答案 0 :(得分:13)
这是一个更简单的方法,只使用正则表达式和替换:
def extract_phone_number(input)
if input.gsub(/\D/, "").match(/^1?(\d{3})(\d{3})(\d{4})/)
[$1, $2, $3].join("-")
end
end
这会删除所有非数字(\D
),跳过可选的前导数字^1?
),然后以块((\d{3})(\d{3})(\d{4})
)和格式提取剩下的第10个数字。< / p>
以下是测试:
test_data = {
"+1 480-874-4666" => "480-874-4666",
"404-581-4000" => "404-581-4000",
"(805) 682-4726" => "805-682-4726",
"978-851-7321, Ext 2606" => "978-851-7321",
"413- 658-1100" => "413-658-1100",
"(513) 287-7000,Toll Free (800) 733-2077" => "513-287-7000",
"1 (813) 274-8130" => "813-274-8130",
"212-363-3200,Media Relations: 212-668-2251." => "212-363-3200",
"323/221-2164" => "323-221-2164",
"" => nil,
"foobar" => nil,
"1234567" => nil,
}
test_data.each do |input, expected_output|
extracted = extract_phone_number(input)
print "FAIL (expected #{expected_output}): " unless extracted == expected_output
puts extracted
end
答案 1 :(得分:2)
我的方法有点不同(我认为更好的恕我直言:-):我不需要错过任何电话号码,即使有2个在线。我也不想让有3组数字的线路相距很远(参见cookies示例),我不想将IP地址误认为电话号码。
代码允许每行多个数字,但也要求数字组彼此“接近”:
def extract_phone_number(input)
result = input.scan(/(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/).map{|e| e.join('-')}
# <result> is an Array of whatever phone numbers were extracted, and the remapping
# takes care of cleaning up each number in the Array into a format of 800-432-1234
result = result.join(' :: ')
# <result> is now a String, with the numbers separated by ' :: '
# ... or there is another way to do it (see text below the code) that only gets the
# first phone number only.
# Details of the Regular Expressions and what they're doing
# 1. (\d{3}) -- get 3 digits (and keep them)
# 2. \D{0,3} -- allow skipping of up to 3 non-digits. This handles hyphens, parentheses, periods, etc.
# 3. (\d{3}) -- get 3 more digits (and keep them)
# 4. \D{0,3} -- skip up to 0-3 non-digits
# 5. (\d{4}) -- keep the final 4 digits
result.empty? ? nil : result
end
以下是测试(还有一些额外的测试)
test_data = {
"DB=Sequel('postgres://user:username@192.168.1.101/test_test')" => nil, # DON'T MISTAKE IP ADDRESSES AS PHONE NUMBERS
"100 cookies + 950 cookes = 1050 cookies" => nil, # THIS IS NEW
"this 123 is a 456 bad number 7890" => nil, # THIS IS NEW
"212-363-3200,Media Relations: 212-668-2251." => "212-363-3200 :: 212-668-2251", # THIS IS CHANGED
"this is +1 480-874-4666" => "480-874-4666",
"something 404-581-4000" => "404-581-4000",
"other (805) 682-4726" => "805-682-4726",
"978-851-7321, Ext 2606" => "978-851-7321",
"413- 658-1100" => "413-658-1100",
"(513) 287-7000,Toll Free (800) 733-2077" => "513-287-7000 :: 800-733-2077", # THIS IS CHANGED
"1 (813) 274-8130" => "813-274-8130",
"323/221-2164" => "323-221-2164",
"" => nil,
"foobar" => nil,
"1234567" => nil,
}
def test_it(test_data)
test_data.each do |input, expected_output|
extracted = extract_phone_number(input)
puts "#{extracted == expected_output ? 'good': 'BAD!'} ::#{input} => #{extracted.inspect}"
end
end
test_it(test_data)
替代实现:通过使用“扫描”,它将自动多次应用正则表达式,如果您希望每行额外多于1个电话号码,这是很好的。如果您只想获得第一个电话号码,那么您也可以使用:
first_phone_number = begin
m = /(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/.match(input)
[m[1],m[2],m[3]].join('-')
rescue nil; end
(使用RegExp的“匹配”功能,只是一种不同的做事方式)
答案 2 :(得分:0)
对于北美计划中的数字,可以使用phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)[1]
例如:
test_phone_numbers = ["+1 480-874-4666",
"404-581-4000",
"(805) 682-4726",
"978-851-7321, Ext 2606",
"413- 658-1100",
"(513) 287-7000,Toll Free (800) 733-2077",
"1 (813) 274-8130",
"212-363-3200,Media Relations: 212-668-2251.",
"323/221-2164",
"foobar"]
test_phone_numbers.each do | phone_number_string |
match = phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)
puts(
if (match)
"#{match[1][0..2]}-#{match[1][3..5]}-#{match[1][6..9]}"
else
"No number to fix."
end
)
end
与起始代码一样,这不会捕获多个数字,例如“(513)287-7000,免费电话(800)733-2077”
FWIW,从长远来看,我发现捕获和存储完整的数字更容易,包括国家代码和没有分隔符;在捕获期间进行猜测,其中numbering plan缺少前缀的是{并且在渲染时选择格式,例如NANP v.DE。
答案 3 :(得分:0)
这是一个旧线程,尽管我想分享一个解决方案。
def extract_phone_number(input)
input.delete!('^0-9').gsub!(/^1?(\d{3})(\d{3})(\d{4})/, '\1-\2-\3')[0..11]
rescue NoMethodError => e
nil
end
delete!
删除所有非数字字符。
gsub!
匹配数字,然后将它们模板化为连字符分隔的字符串。
[0..11]
切掉所需的数字(对于扩展名)
急救块可防止在nil
上调用修改方法
使用上面发布的测试。
tests = {
'+1 480-874-4666' => '480-874-4666',
'404-581-4000' => '404-581-4000',
'(805) 682-4726' => '805-682-4726',
'978-851-7321, Ext 2606' => '978-851-7321',
'413- 658-1100' => '413-658-1100',
'(513) 287-7000,Toll Free (800) 733-2077' => '513-287-7000',
'1 (813) 274-8130' => '813-274-8130',
'212-363-3200,Media Relations: 212-668-2251.' => '212-363-3200',
'323/221-2164' => '323-221-2164',
'' => nil,
'foobar' => nil,
'1234567' => nil
}
tests.each do |input, expected_output|
input = input.dup if input.frozen?
result = extract_phone_number(input)
if result == expected_output
print "PASS: #{result}\n"
else
print "FAIL (expected #{expected_output})\n"
end
end
# Console
=> PASS: 480-874-4666
=> PASS: 404-581-4000
=> PASS: 805-682-4726
=> PASS: 978-851-7321
=> PASS: 413-658-1100
=> PASS: 513-287-7000
=> PASS: 813-274-8130
=> PASS: 212-363-3200
=> PASS: 323-221-2164
=> PASS:
=> PASS:
=> PASS: