从文字

时间:2016-04-27 15:40:07

标签: ruby-on-rails ruby regex

如果字符串格式不同,如何删除字符串中的电话号码?

例如我有:

text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
    Smart Functionality: Yes - xx TV Streaming Platform
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'

我如何从文本中删除这些格式

 09414241441 095-41-41-441 (096)4141441 091-123-11-22 094 00 111 222

如何删除这些电话号码?

(093) 123-34-56 (068) 123 45 67 (095) 123 456 78

我尝试了gsub,但删除了所有相似的数字。

4 个答案:

答案 0 :(得分:3)

您可以使用:

text.gsub(/\([0-9]*\)\s[0-9]*(-|\s)[0-9]*(-|\s)[0-9]*/, '')

这个将以您在文本中指定的格式删除手机:

  • (XXX)XXX-XX-XX
  • (XXX)XXX XX XX

并且当您尝试编写正则表达式时,请尝试使用此Rubular

  • \([0-9]*\)需要在括号(...)中捕获数字,但由于括号是正则表达式中的特殊字符,因此在其前面添加\[0-9]表示需要数字和不仅仅是1个数字,所以添加*意味着0或更多的数字应该在里面,

  • \s之后需要一个空格,

  • (-|\s)需要短划线(-)(或|)空格(\s

适用于其他格式:

  • XXXXXXXXXX
  • XXX-XX-XX-XXX
  • (XXX)XXXXXXX

旁边的一个,与下面的对象:

text.gsub(/\(*[0-9]+(\)|-)+\s*[0-9]+(-|\s)*[0-9]+(-|\s)*[0-9]+|[0-9]{10}/, '')

答案 1 :(得分:1)

根据您的格式,以下正则表达式

/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/

Ruby代码

print text.gsub(/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/, "")

<强> Ideone Demo

答案 2 :(得分:0)

如果您的文字是固定格式,那么数字将始终是该块中的第一行,那么只需删除第一行:

text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
    Smart Functionality: Yes - xx TV Streaming Platform
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'

text.strip
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
text.strip.lines[1..-1].join
# => "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

或者:

lines = text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
lines.shift
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n"
lines.join
# => "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

使用正则表达式和gsub可以正常工作,但它也更有可能成为维护问题。

如果电话号码始终在一行,但不一定是第一行,那么我仍然会使用lines将文本分成数组,但我会使用reject一个匹配数字模式的正则表达式来检查每一行并拒绝具有类似电话号码的正则表达式匹配的那一行:

lines = text.lines
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }
# => ["\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]

lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }.join
# => "\n    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

请注意,使用strip会导致保留前导“\ n”。

使用lines将文本转换为数组有助于隔离任何损坏,以防其他因素触发模式匹配,从而导致对文本的无意损坏。

这种方法失败的地方是电话号码分散在整个文本中。我仍然可能会使用这种方法将文本减少到单独的行,但是如果存在误报则可以减少可能的损坏。

答案 3 :(得分:0)

phone_formats = [/(\d{3}) \d{3}-\d{4}/,
                 /\d{3}-\d{3}-\d{4}/,
                 /\d{3} \d{3} \d{4}/,
                 /\(\d{3}\) \d{3} \d{3} \d{2}/,
                 /\(\d{3}\) \d{3} \d{2} \d{2}/,
                 /\(\d{3}\) \d{3}-\d{2}-\d{2}/,
                 /\d{3}-\d{3}-\d{2}-\d{2}/,
                 /\d{3}-\d{3}-\d{2}-\d{2}/]

r = Regexp.union(phone_formats)
  #=> /(?-mix:(\d{3}) \d{3}-\d{4})|
  #    (?-mix:\d{3}-\d{3}-\d{4})|
  #    (?-mix:\d{3} \d{3} \d{4})|
  #    (?-mix:\(\d{3}\) \d{3} \d{3} \d{2})|
  #    (?-mix:\(\d{3}\) \d{3} \d{2} \d{2})|
  #    (?-mix:\(\d{3}\) \d{3}-\d{2}-\d{2})|
  #    (?-mix:\d{3}-\d{3}-\d{2}-\d{2})|
  #    (?-mix:\d{3}-\d{3}-\d{2}-\d{2})/ 

(为了提高可读性,我在每个Regexp.union之后打破了|的返回值。)

text =<<_
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78
_

puts text.gsub(r,'')

Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78