Question

我希望在某些文字中获取所有句号和周围的单词。以下文字可以是一个例子：

本研究旨在设计从丁香叶油的丁香酚中生产异丁香酚和香草醛，并在经济上分析其潜在的产品开发。该研究工作的具体目标是：1。鉴定异丁香酚和香草醛。 2.异丁香酚和香草醛工艺设计的模型模拟。 3.研究财务可行性和附加值。该研究有望为丁香酚提供最大的经济潜力，以提高丁香叶油的附加值。结果显示FTIR和NMR产物证实合成产物中存在的异丁香酚和香草醛与参考标准相同。

当我使用模式

时

\w+\.\s\w+

在上面的字符串中，它匹配（来自and vanillin. 2. Model simulation} vanillin. 2部分但跳过2. Model。

我希望它与vanillin. 2和2. Model匹配。

你可以给我一些改进，以便我得到所有时期吗？

Answer 1

将positive lookahead assertion与capturing group：

一起使用

(?=(\b\w+\.(?:\s+\w+|$)))

按如下方式使用：

preg_match_all('/(?=(\b\w+\.(?:\s+\w+|$)))/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];

<强>解释

(?=       # Assert that the following can be matched at the current position:
 (        # Capture into group number 1:
  \b      # - Beginning of a word
  \w+     # - an alphanumeric word
  \.      # - a dot
  (?:     # - Then either...
   \s+\w+ #   - whitespace and another word
  |       # - or... 
   $      #   - the end of the string.
  )       # End of alternation
 )        # End of capturing group 1
)         # End of lookahead

See it in action on regex101.com

获取之前和之后的句点和单词（重叠匹配）

1 个答案: