Question

我已经完成了文本挖掘模块的最后工作，并且我难以解决所提出的问题，因此我要求提供支持以跟进python中的工作：

目标是编写一个函数，该函数接收用户文本作为输入，并返回引用您所请求的餐点和数量的文本片段（块）。不必在此功能之前构建意图的分类器，也不是本练习的目的，而仅仅是我们假定会接收意图为“ Order_food”的短语的功能。标准化输出也不是客观的（例如，没有必要将“ 3”转换为“ 3”或将“ pizza”转换为“ pizza”）。因此，这是最低要求。

Por ejemplo：“ quiero 3 bocadillos de anchoas y 2比萨饼”→

 {comida:'bocadillo', ingrediente:'anchoas', cantidad:3},
 {comida:'pizza', ingrediente:'null', cantidad:2}

因此，该函数的输出将是一个包含2个元素（食物和数量）的字典的数组。当未检测到金额时，其值将默认设置为“ 1”。

最诚挚的问候。

Answer 1

在python中，我将使用NLTK库对句子进行分类。

例如：

请给我两个三明治和一个比萨饼

将标记的结果显示为

请/ NN Give / VBP me / PRP 2 / CD三明治/ NNS和/ CC a / DT披萨/ NN

请给我三个三明治和比萨饼

请/ NN Give / VBP me / PRP 3 / CD三明治/ NNS和/ CC披萨/ NNS

使用标签i将清除该语句。（保持/ CD，/ DT，/ NN和/ NNS）

请/ NN 2 / CD三明治/ NNS a / DT比萨/ NN
请/ NN 3个/ CD三明治/ NNS披萨/ NNS

找到/ CD的第一个匹配项，如果不是/ DT的第一个匹配项，或者不是/ NN的第一个。

2 / CD三明治/ NNS a / DT比萨/ NN
三个/ CD三明治/ NNS披萨/ NNS

将/ DT视为1，并且如果/ NN和/ NNS之间没有/ CD和/ DT，那么我会认为它们之间为1

最终结果在下面，您可以根据需要的格式对其进行分析。

2个，三明治，1个披萨
三个，三明治，1个披萨

这只是一个示例，它将使您开始学习，并且存在许多缺陷，例如基于传递的字符串多次出现/ NN或/ NNS。也是传递字符串的语言。

但是我希望社区能够扩展并提供更好的逻辑，以便在语句被标记后对其进行分类。

您可以通过在数组中导航来清理数据。下面是示例：

mySentance = [('please', 'VB'), ('give', 'VB'), ('me', 'PRP'), ('2', 'CD'), ('sandwiches', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('pizzas', 'NN')]
newData = ""
StartCapture = "No"

for i in range(len(mySentance)):
    if mySentance[i][1] == "CD" or mySentance[i][1] == "DT":
        StartCapture = "Yes"

    if StartCapture == "Yes":

        if mySentance[i][1] == "CD":
            newData = newData + mySentance[i][0] + " "

        if mySentance[i][1] == "DT":    
            newData = newData + "1 "

        if mySentance[i][1] == "NN":
            newData = newData + mySentance[i][0] + " "

        if mySentance[i][1] == "NNS":    
            newData = newData + mySentance[i][0] + " "
print(newData)

结果： 2个三明治1个比萨饼

Answer 2

Thank you Vishal. 
see my code. i need to improve.

#Import required modules
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk.corpus import wordnet
from nltk.corpus import stopwords

def chunk_texto(text):
    tokens = nltk.word_tokenize(text)
    mySentance = nltk.pos_tag(tokens)
    newData = ""
    StartCapture = "No"

    stopwords = {'de'}

    #print(mySentance)

    for i in range(len(mySentance)):
        if mySentance[i][1] == "CD" or mySentance[i][1] == "DT":
            StartCapture = "Yes"

        if StartCapture == "Yes":
            if mySentance[i][1] == "NN":
               # newData = newData + mySentance[i][0] + " "
               newData =  " Comida: "+mySentance[i][0]+", " + newData + " "

            if mySentance[i][1] == "NNS":    
                #newData = newData + mySentance[i][0] + " "
                newData =  " Comida: "+mySentance[i][0] +", "+ newData + " "

            if mySentance[i][1] == "FW" and mySentance[i][0] not in stopwords:
               # newData = newData + mySentance[i][0] + " "
               newData =  " ingrediente: "+mySentance[i][0]+", " + newData + " "

            if mySentance[i][1] == "CD":
                #newData = newData + mySentance[i][0] + " "
                newData =  " Quantidade: " + mySentance[i][0] +", "+ newData + " "

            if mySentance[i][1] == "DT":       
                newData = newData + "1 "

    print(newData +'\n')


print(chunk_texto("quiero 3 bocadillos de anchoas y 2 pizzas"))

The result is:

Comida: pizzas,  Quantidade: 2,  ingrediente: anchoas,  Comida: bocadillos,  
Quantidade: 3,      

but I want the result in this way:

{comida:'bocadillo', ingrediente:'anchoas', Quantidade:3},
{comida:'pizza', ingrediente:'null', Quantidade:2}

    enter code here

从文本片段（块）创建任何算法

2 个答案: