Question

我正在下面的简单代码下运行，以获取包含该单词的所有标记（例如，包含比较，不比较，此比较的单词）。

但是，spaCy正则表达式不返回任何内容。正则表达式单词在python re上很好。

您能否让我知道这是一个空间问题还是如何解决该问题？

它返回[]，空列表。

import plac
from spacy.lang.en import English
from spacy.matcher import PhraseMatcher, Matcher
from spacy.tokens import Doc, Span, Token
import spacy

nlp = spacy.load("en_core_web_sm")

text = """
"Net income was $9.4 million acompared to the prior year of $2.7
million.",
"Revenue exceeded twelve billion dollars, with a loss of $1b. run",
"""

doc = nlp(text)

pattern = [{"LOWER": {"REGEX": "\b\wcompared\w\b"}}]

matcher = Matcher(nlp.vocab)
matcher.add("item", None, pattern )
matches = matcher(doc)
print(matches)
print(matcher)

此代码应返回“比较”令牌的位置。

Answer 1

即使使用python re，我也看不到此正则表达式，因为它试图匹配include test.bl，而您的文本中没有任何内容符合以下模式

word followed by compared followed by word (surrounded by word boundaries )

您只需将正则表达式更改为

\b\wcompared\w\b

Demo

Answer 2

RegEx 1

如果我们要查找其中带有async openModal() { const modal: HTMLIonModalElement = await this.modalController.create({ component: DatePickerModal, componentProps: { aParameter: true, otherParameter: new Date() } }); modal.onDidDismiss().then((detail: OverlayEventDetail) => { if (detail !== null) { console.log('The result:', detail.data); } }); await modal.present(); }的任何单词，也许此表达式可能有效：

.page .page-wrap .content-wrapper, .single .page-wrap .content-wrapper {
    // padding: 30px; (before)
    padding: 0px; // fixed
}

Demo

使用`compared`

进行测试

\b\w*(?:compared)\w*\b

RegEx 2

如果我们想查找其中带有re.finditer的字符串，我想这是import re regex = r"\b\w*(?:compared)\w*\b" test_str = "some text you wish before then compared or anythingcompared or any_thing_01_compared_anything_after_that " matches = re.finditer(regex, test_str) for matchNum, match in enumerate(matches, start=1): print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group())) for groupNum in range(0, len(match.groups())): groupNum = groupNum + 1 print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))模式下的表达式，

compared

DEMO 2

或以s模式的这个

^(?=.*\bacompared\b|\bthiscompared\b|\bnotcompared\b).*$

可能是解决此问题的开始。

DEMO 3

使用`m`进行测试1

^(?=[\s\S]*\bacompared\b|\bthiscompared\b|\bnotcompared\b)[\s\S]*$

使用`re.findall`进行测试2

import re

regex = r"^(?=.*\bacompared\b|\bthiscompared\b|\bnotcompared\b).*$"

test_str = ("Net income was $9.4 million acompared to the prior year of $2.7        million.,\n\n"
    "some other words with new lines")

print(re.findall(regex, test_str, re.DOTALL))

Answer 3

尽管以上答案适用于python re，但是SpaCy需要特定类型的模式描述格式。该模式应改为包含单词“ TEXT”。例如，

pattern = [{"TEXT": {"REGEX": "compared*"}}].

Python SpaCy Regex不会选择包含单词的令牌

3 个答案:

RegEx 1

Demo

使用`compared`

RegEx 2

DEMO 2

DEMO 3

使用`m`进行测试1

使用`re.findall`进行测试2

Python SpaCy Regex不会选择包含单词的令牌

3 个答案:

RegEx 1

Demo

使用compared

RegEx 2

DEMO 2

DEMO 3

使用m进行测试1

使用re.findall进行测试2

使用`compared`

使用`m`进行测试1

使用`re.findall`进行测试2