有没有一种方法可以提取单词后的实体作为实体?例如:
我想提取about
或go to
或learn
之后的所有内容。
##intent:navigate
-I want to learn about linear regression
-I want to read about SVM
-I want to go to Python 2.6
-Take me to logistic regression: eval
##regex:topic
-^[A-Za-z0-9 :_ -][A-Za-z0-9 :_ -][A-Za-z0-9 :_ -]$
答案 0 :(得分:0)
天真的方法可能非常简单-例如使用分割字符串方法
sentences = ["I want to learn about linear regression", "I want to read about SVM", "I want to go to Python 2.6",
"Take me to logistic regression: eval"]
split_terms = ["about", "go to", "learn"]
for sentence in sentences:
for split_term in split_terms:
try:
print(sentence.split(split_term)[1])
except IndexError:
pass # split_term was not found in a sentence
结果:
linear regression
about linear regression
SVM
Python 2.6
更聪明的方法可能是首先找到最后一个“分裂术语”以解决学习问题-了解-关于
for sentence in sentences:
last_split_term_index = 0
last_split_term = ""
for split_term in split_terms:
last_split_term_index_candidate = sentence.find(split_term)
if last_split_term_index_candidate > last_split_term_index:
last_split_term_index = last_split_term_index_candidate
last_split_term = split_term
try:
print(sentence.split(last_split_term)[1])
except:
continue
结果:
linear regression
SVM
Python 2.6
答案 1 :(得分:0)
是的,您将必须在训练数据中定义实体,然后由模型将其提取。例如,在您的示例中,训练数据应该像这样。
##intent:navigate
- I want to learn about [linear regression](topic)
- I want to talk about [RasaNLU](topic) for the rest of the day.
- I want to go to [Berlin](topic) for a specific work.
- I want to read about [SVM](topic)
- I want to go to [Python 2.6](topic)
- Take me to logistic regression: eval
模型训练后,我尝试了一个例子
Enter a message: I want to talk about SVM
{
"intent": {
"name": "navigate",
"confidence": 0.9576369524002075
},
"entities": [
{
"start": 21,
"end": 24,
"value": "SVM",
"entity": "topic",
"confidence": 0.8241770362411013,
"extractor": "CRFEntityExtractor"
}
]
}
但是要使此方法有效,您将必须定义更多具有所有可能模式的示例。就像示例“我想在当天剩下的时间里谈论RasaNLU”一样。提出了一个模型,即要提取的实体不必是句子的最后一个单词(在其余示例中就是这种情况)。