spacy是否提供从LIKE_NUM
令牌到python浮点十进制的快速转换。 Spacy可以匹配LIKE_NUM
令牌,例如“ 31,2”,“ 10.9”,“ 10”,“十”等。它是否也提供一种快速获取python数字的方法?我期望像.get_value()
这样的方法向我返回数字(而不是字符串),但是找不到任何内容。
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)
text = "this is just a text and a number 10,2 or 10.2 meaning ten point two"
doc = nlp(text)
pattern = [{"LIKE_NUM": True}]
matcher.add("number_match", None, pattern)
matches = matcher(doc)
print("All matches:")
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
print(type(span.text))
输出为:
All matches:
13316671205374851783 number_match 8 9 10,2
<class 'str'>
13316671205374851783 number_match 10 11 10.2
<class 'str'>
13316671205374851783 number_match 12 13 ten
<class 'str'>
13316671205374851783 number_match 14 15 two
<class 'str'>