Question

因此，我正在做自己的家庭助理，并且正在尝试建立一个多目的分类系统。但是，我找不到将用户所说的查询拆分为查询中多个不同意图的方法。

例如：

I have my data for one of my intents (same format for all) 

{"intent_name": "music.off" , "examples": ["turn off the music" , "kill 
the music" , "cut the music"]}

，用户说的查询将是：

'dim the lights, cut the music and play Black Mirror on tv'

我想将句子分为各自的意图，例如：

['dim the lights', 'cut the music', 'play black mirror on tv']

但是，我不能仅在带有re.split和and的句子上使用,作为分隔符，就像用户要求的那样：

'turn the lights off in the living room, dining room, kitchen and bedroom'

这将被拆分为

['turn the lights off in the living room', 'kitchen', 'dining room', 'bedroom']

这不适用于我的意图检测

这是我的问题，谢谢您

更新

好的，我的代码已经到此为止了，它可以从我的数据中获取示例，并根据需要确定内部不同的意图，但是它并没有将原始查询的各个部分分成各自的意图，而仅仅是匹配。

import nltk
import spacy
import os
import json
#import difflib
#import substring
#import re
#from fuzzysearch import find_near_matches
#from fuzzywuzzy import process

text = "dim the lights, shut down the music and play White Collar"

commands = []

def get_matches():

    for root, dirs, files in os.walk("./data"):  

        for filename in files:

            f = open(f"./data/{filename}" , "r")
            file_ = f.read()
            data = json.loads(file_)

            choices.append(data["examples"])

        for set_ in choices:

            command = process.extract(text, set_ , limit=1)

            commands.append(command)

    print(f"all commands : {commands}")

这将返回[('dim the lights') , ('turn off the music') , ('play Black Mirror')]，这是正确的意图，但是我无法知道查询的哪一部分与每个意图有关-这是主要问题

我的数据如下，在我找出一个方法之前，现在非常简单。

play.json

{"intent_name": "play.device" , "examples" : ["play Black Mirror" , "play Netflix on tv" , "can you please stream Stranger Things"]}

music.json

{"intent_name": "music.off" , "examples": ["turn off the music" , "cut the music" , "kill the music"]}

lights.json

{"intent_name": "lights.dim" , "examples" : ["dim the lights" , "turn down the lights" , "lower the brightness"]}

Answer 1

似乎您在问题中混了两个问题：

单个查询（例如shut down the music and play White Collar）中有多个独立的意图

turn the lights off in the living room bedroom and kitchen

多个插槽（使用表单填充框架）。

这些问题完全不同。但是，这两种方法都可以表述为单词标记问题（类似于POS标记），并可以通过机器学习来解决（例如在预先训练的单词嵌入上使用CRF或bi-LSTM，预测每个单词的标签）。

每个单词的意图标签可以使用BIO表示法创建，例如

shut   B-music_off
down   I-music_off
the    I-music_off
music  I-music_off
and    O
play   B-tv_on
White  I-tv_on
Collar I-tv_on

turn    B-light_off
the     I-light-off
lights  I-light-off 
off     I-light-off
in      I-light-off
the     I-light-off
living  I-light-off
room    I-light-off
bedroom I-light-off
and     I-light-off
kitchen I-light-off

模型将读取句子并预测标签。应该至少对数百个示例进行培训-您必须生成或挖掘它们。

使用在这样的标签上训练的模型分割意图后，您将拥有与每个唯一意图相对应的简短文本。然后，对于每个短文本，您需要运行第二个分段，查找广告位。例如。关于光的句子可以表示为

turn    B-action
the     I-action
lights  I-action
off     I-action
in      O
the     B-place
living  I-place
room    I-place
bedroom B-place
and     O
kitchen B-place

现在BIO标记非常有用：the B-place标签将bedroom与the living room分开。

原则上，两个分段都可以由一个分层的端到端模型（如果需要，可以进行Google语义解析）执行，但是我觉得两个更简单的标记器也可以工作。

多意图自然语言处理和分类

更新

1 个答案: