Question

我有一个用户输入文本的搜索字符串。

如果它包含邮政编码的任何部分，例如：1N1或1N11N1或1N1 1N1，那么我想将其从文本中删除。

示例：

John Doe 1n11n1

或

1n1 John Doe

或

John 1n11n1 Doe

我想抓住这个：

postal_code: 1n11n1
other: John Doe

可以使用正则表达式完成吗？

Answer 1

尝试匹配正则表达式/((?:\d[A-Za-z]\d)+)/并返回$1：

def get_postal_code(s)
  r = /((?:\d[A-Za-z]\d)+)/
  return (s =~ r) ? [$1, s.sub(r,'')] : nil
end

# Example usage...
get_postal_code('John Doe 1n11n1') # => ['1n11n1', 'John Doe ']
get_postal_code('1n1 John Doe') # => ['1n1', ' John Doe']
get_postal_code('John Doe 1n1') # => ['1n1', 'John Doe ']

您还可以按如下方式清理“其他”字符串。

  ...
  return (s =~ r) ? [$1, s.sub(r,'').gsub(/\s+/,' ').strip] : nil
end
get_postal_code('John Doe 1n11n1') # => ['1n11n1', 'John Doe']
get_postal_code('1n1 John Doe') # => ['1n1', 'John Doe']
get_postal_code('John Doe 1n1') # => ['1n1', 'John Doe']

Answer 2

不确定邮政编码的格式是什么，但我肯定会使用regexlib： http://regexlib.com/Search.aspx?k=postal%20code

您会发现许多正则表达式可用于匹配字符串中的邮政编码。要获取字符串的其余部分，您只需在邮政编码上执行正则表达式删除并获取结果字符串。可能有一种更有效的方法可以做到这一点，但我的目的是为了简单：）

希望这有帮助！

Answer 3

是的，这可以使用正则表达式完成。根据行中数据的类型，您可能存在误报风险，因为与模式匹配的任何内容都将被视为邮政编码（在您的示例中，尽管看起来似乎不太可能）。

假设在你的模式中N是一个字母字符，1是一个数字字符，你可以做如下的事情：

strings = ["John Doe 1n11n1", "1n1 John Doe", "John 1n1 1n1 Doe"]
regex = /([0-9]{1}[A-Za-z]{1}[0-9]{2}[A-Za-z]{1}[0-9]{1}|[0-9]{1}[A-Za-z]{1}[0-9]{1}\s[0-9]{1}[A-Za-z]{1}[0-9]{1}|[0-9]{1}[A-Za-z]{1}[0-9]{1})/
strings.each do |s|
  if regex.match(s)
    puts "postal_code: #{regex.match(s)[1]}"
    puts "rest: #{s.gsub(regex, "")}"
    puts
  end
end

输出：

postal_code: 1n11n1
rest: John Doe 

postal_code: 1n1
rest:  John Doe

postal_code: 1n1 1n1
rest: John  Doe

如果你想摆脱多余的空间，你可以使用String＃squeeze（“”）来实现它：）

正则表达式从字符串中提取邮政编码

3 个答案: