Question

我有一个正则表达式，返回带有标点符号，标点符号和不包含标点符号的单词。

func main() {
    const a = math.MaxInt64
    fmt.Println(a + 1)             //constant 9223372036854775808 overflows int
    fmt.Println(math.MaxInt64 + 1) //constant 9223372036854775808 overflows int
}

我想改进，只返回标点符号或包含标点符号的单词。

目前我这样使用它：

class String

     def words_and_punctuation
        scan(/[\w'-]+|[[:punct:]]+/)
      end

     def punctuation?
       scan(/\s?[[:punct:]]/).present? 
     end
end

string =“男人的帽子非常非常好。”

text.words_and_punctuation.select(&:punctuation?)

我不想使用正则表达式来选择和匹配正确的元素。

任何帮助表示感谢。

Answer 1

"The man's hat is really, very nice.".
  scan /\w+[[:punct:]]\w+|[[:punct:]](?=\s|\z)/
#⇒ ["man's", ",", "."]

可能满足您的需求。这是非常不准确的，因为它匹配像“foo！bar”这样的拼写错误，但它对于这个特定的任务应该足够了。

Answer 2

这个怎么样？

 /[a-zA-z]+['-][a-z]+|[[:punct:]]/

我试了几句：

2.4.1 :056 > r = Regexp.new /[a-zA-z]+['-][a-z]+|[[:punct:]]/
=> /[a-zA-z]+['-][mst]|[[:punct:]]/
2.4.1 :057 > "The man's hat was, very nice".scan(r)
=> ["man's", ","]
2.4.1 :058 > "The man's hat was, very nice.".scan(r)
=> ["man's", ",", "."]
2.4.1 :059 > "The man's hat was, very nice. who. . would have thougt so?".scan(r)
=> ["man's", ",", ".", ".", ".", "?"]

它的工作方式是，标准英语单词 - 连字符和撇号中只出现几种标点符号。所以正则表达式的第一部分，在管道角色之前，寻找那些单词，而后半部分则捕获其他所有单词。

Answer 3

常见的要求是存在标点符号，因此强制要求：

def words_and_punctuation
    scan(/(?:[[:punct:]]|[\w'-])*[[:punct:]]+(?:[[:punct:]]|[\w'-])*/)
end

在更典型的正则表达式中，我们可能将此模式编写为：

[&$#^@.A-Za-z0-9'-]*[&$#^@.]+[&$#^@.A-Za-z0-9'-]*

换句话说，这只是说匹配一个或多个标点字符，可选地由单词字符或更多标点符号包围。但是这个模式与没有标点符号的单词不匹配。

Answer 4

我意识到我的要求比我发布时要多一些。

我需要匹配部分带连字符的单词（例如“-fast”）以及“甚至是现收现付”。

所以我找到了以下正则表达式。

regex = /\w*['-]\w*[-]*\w*[-]*\w*|[[:punct:]]+/

string = "The man, had a big-cat that his Sister's aunt gave him and was -fast 's very-very-big-cat.!!"

句子没有多大意义，但包括一些我需要匹配的标点符号和标点符号的好例子。

string.scan（正则表达式）

=> [",", "big-cat", "Sister's", "-fast", "'s", "very-very-big-cat", ".!!"]

可能有一些方法可以改进正则表达式的编写方式，但这是我能做的最好的，可以得到我需要的结果。

如何修复此正则表达式，以便它只返回标点符号和包含标点符号的单词？

4 个答案: