问题

Question

我一直在阅读，搜索和试用不同的方法来编写正则表达式，例如p {L}，[a-z]和\ w但是我似乎无法得到我想要的结果。

问题

我有一个由带标点符号的完整句子组成的数组，我使用以下pre_match通过数组解析，这样可以很好地保留单词和标点符号。

preg_match_all('/(\w+|[.;?!,:])/', $match, $matches)

但是，我现在有这样的话：

字另一个字
more_words_like_these

我希望能够保留这些单词的完整性，因为它们是（连接的）但我当前的preg_match将它们分解为单个单词。

我尝试了什么

preg_match_all('/(p{L}-p{L}+|[.;?!,:])/', $match, $matches)

和

preg_match_all('/((?i)^[\p{L}0-9_-]+|[.;?!,:])/', $match, $matches)

我是从here

找到的

但无法达到预期的效果：

Array ( [0] A, [1] word, [2] like_this, [3] connected, [4] ; ,[5] with-relevant-punctuation)

理想情况下，我也可以考虑特殊字符，因为其中一些单词可能有重音

Answer 1

只需将连字符插入字符类即可。但请注意，连字符需要出现在字符集的开头或结尾。否则它将被视为范围符号。

(\w+|[-.;?!,:])

Regular expression visualization

实施例

现场演示

https://regex101.com/r/yI3tM4/2

示例文字

However, I now have words like these:

Word-another-word
more_words_like_these

and I would like to be able to retain the integrity of these words as they are (connected) but my current preg_match breaks them down into individual words.

样本匹配

其他单词像以前一样被捕获，但带有连字符的单词也被捕获

Omitted Match 1-9 for brevity 

MATCH 10
1.  [39-56] `Word-another-word`

MATCH 11
1.  [57-78] `more_words_like_these`

Omitted Match 12+ for brevity

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [-.;?!,:]                any character of: '-', '.', ';', '?',
                             '!', ',', ':'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------

正则表达式用连字符和下划线连接的单词，同时保持标点符号

问题

我尝试了什么

1 个答案:

实施例

解释