Question

我正在尝试在Python中使用正则表达式从文本中捕获整个单词。这很简单，但我也想删除撇号表示的收缩和占有。

目前我有(?iu)(?<!')(?!n')[\w]+

测试以下文字

一棵树还是多棵树？我的树是绿色的。我还没有想到这一点。

提供这些匹配

一棵树或多棵树我的树绿了我还没弄明白

在这个例子中，负面的后视可以防止＆＃34; s＆＃34;和＆＃34; t＆＃34;在撇号被整个单词匹配之后。但是我如何编写负向前瞻(?!n')以便匹配包括＆＃34;确实＆＃34;而不是＆＃34;没有＆＃34;？

（我的用例是一个简单的Python拼写检查器，每个单词都被验证为拼写正确与否。我最终使用autocorrect module作为pyenchant，aspell-python和其他人没有＃＆＃ 39;通过pip安装时工作）

Answer 1

我会使用这个正则表达式：

(?<![\w'])\w+?(?=\b|n't)

这会匹配单词字符，直到遇到n't。

结果：

>>> re.findall(r"(?<![\w'])\w+?(?=\b|n't)", "One tree or many trees? My tree's green. I didn't figure this out yet.")
['One', 'tree', 'or', 'many', 'trees', 'My', 'tree', 'green', 'I', 'did', 'figure', 'this', 'out', 'yet']

故障：

(?<!         # negative lookbehind: assert the text is not preceded by...
    [\w']    # ... a word character or apostrophe
)
\w+?         # match word characters, as few as necessary, until...
(?=
    \b       # ... a word boundary...
|            # ... or ...
    n't      # ... the text "n't"
)

Python正则表达式匹配整个单词（减去收缩和占有）

1 个答案: