我正在下面的简单代码下运行,以获取包含该单词的所有标记(例如,包含比较,不比较,此比较的单词)。
但是,spaCy正则表达式不返回任何内容。正则表达式单词在python re上很好。
您能否让我知道这是一个空间问题还是如何解决该问题?
它返回[],空列表。
import plac
from spacy.lang.en import English
from spacy.matcher import PhraseMatcher, Matcher
from spacy.tokens import Doc, Span, Token
import spacy
nlp = spacy.load("en_core_web_sm")
text = """
"Net income was $9.4 million acompared to the prior year of $2.7
million.",
"Revenue exceeded twelve billion dollars, with a loss of $1b. run",
"""
doc = nlp(text)
pattern = [{"LOWER": {"REGEX": "\b\wcompared\w\b"}}]
matcher = Matcher(nlp.vocab)
matcher.add("item", None, pattern )
matches = matcher(doc)
print(matches)
print(matcher)
此代码应返回“比较”令牌的位置。
答案 0 :(得分:2)
即使使用python re,我也看不到此正则表达式,因为它试图匹配include test.bl
,而您的文本中没有任何内容符合以下模式
word followed by compared followed by word (surrounded by word boundaries )
您只需将正则表达式更改为
\b\wcompared\w\b
答案 1 :(得分:1)
如果我们要查找其中带有async openModal() {
const modal: HTMLIonModalElement =
await this.modalController.create({
component: DatePickerModal,
componentProps: {
aParameter: true,
otherParameter: new Date()
}
});
modal.onDidDismiss().then((detail: OverlayEventDetail) => {
if (detail !== null) {
console.log('The result:', detail.data);
}
});
await modal.present();
}
的任何单词,也许此表达式可能有效:
.page .page-wrap .content-wrapper, .single .page-wrap .content-wrapper {
// padding: 30px; (before)
padding: 0px; // fixed
}
compared
\b\w*(?:compared)\w*\b
如果我们想查找其中带有re.finditer
的字符串,我想这是import re
regex = r"\b\w*(?:compared)\w*\b"
test_str = "some text you wish before then compared or anythingcompared or any_thing_01_compared_anything_after_that "
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
模式下的表达式,
compared
或以s
模式的这个
^(?=.*\bacompared\b|\bthiscompared\b|\bnotcompared\b).*$
可能是解决此问题的开始。
m
进行测试1 ^(?=[\s\S]*\bacompared\b|\bthiscompared\b|\bnotcompared\b)[\s\S]*$
re.findall
进行测试2 import re
regex = r"^(?=.*\bacompared\b|\bthiscompared\b|\bnotcompared\b).*$"
test_str = ("Net income was $9.4 million acompared to the prior year of $2.7 million.,\n\n"
"some other words with new lines")
print(re.findall(regex, test_str, re.DOTALL))
答案 2 :(得分:1)
尽管以上答案适用于python re
,但是SpaCy需要特定类型的模式描述格式。该模式应改为包含单词“ TEXT”。例如,
pattern = [{"TEXT": {"REGEX": "compared*"}}].