RASA NLU-我想在单词后提取任何内容(单词,数字或特殊字符)作为实体

时间:2019-05-09 16:33:16

标签: nlp rasa-nlu named-entity-extraction

有没有一种方法可以提取单词后的实体作为实体?例如:

我想提取aboutgo tolearn之后的所有内容。

##intent:navigate
-I want to learn about linear regression
-I want to read about SVM
-I want to go to Python 2.6
-Take me to logistic regression: eval

##regex:topic
-^[A-Za-z0-9 :_ -][A-Za-z0-9 :_ -][A-Za-z0-9 :_ -]$

2 个答案:

答案 0 :(得分:0)

天真的方法可能非常简单-例如使用分割字符串方法

sentences = ["I want to learn about linear regression", "I want to read about SVM", "I want to go to Python 2.6",
 "Take me to logistic regression: eval"]

split_terms = ["about", "go to", "learn"]

for sentence in sentences:
    for split_term in split_terms:
        try:
            print(sentence.split(split_term)[1])
        except IndexError:
            pass # split_term was not found in a sentence

结果:

 linear regression
 about linear regression
 SVM
 Python 2.6

更聪明的方法可能是首先找到最后一个“分裂术语”以解决学习问题-了解-关于

for sentence in sentences:
    last_split_term_index = 0
    last_split_term = ""
    for split_term in split_terms:
        last_split_term_index_candidate = sentence.find(split_term)
        if last_split_term_index_candidate > last_split_term_index:
            last_split_term_index = last_split_term_index_candidate
            last_split_term = split_term
    try:
        print(sentence.split(last_split_term)[1])

    except:
        continue

结果:

 linear regression
 SVM
 Python 2.6

答案 1 :(得分:0)

是的,您将必须在训练数据中定义实体,然后由模型将其提取。例如,在您的示例中,训练数据应该像这样。

##intent:navigate
- I want to learn about [linear regression](topic)
- I want to talk about [RasaNLU](topic) for the rest of the day.
- I want to go to [Berlin](topic) for a specific work.
- I want to read about [SVM](topic)
- I want to go to [Python 2.6](topic)
- Take me to logistic regression: eval

模型训练后,我尝试了一个例子

Enter a message: I want to talk about SVM     
{
  "intent": {
    "name": "navigate",
    "confidence": 0.9576369524002075
  },
  "entities": [
    {
      "start": 21,
      "end": 24,
      "value": "SVM",
      "entity": "topic",
      "confidence": 0.8241770362411013,
      "extractor": "CRFEntityExtractor"
    }
  ]
}

但是要使此方法有效,您将必须定义更多具有所有可能模式的示例。就像示例“我想在当天剩下的时间里谈论RasaNLU”一样。提出了一个模型,即要提取的实体不必是句子的最后一个单词(在其余示例中就是这种情况)。