Question

拼写错误的问题：使用用户输入创建应用程序时，可能会出现输入错误。有可用于处理它们的拼写检查库，但是用户定义的数据可能不存在于字典中。例如，您正在构建聊天机器人，您需要输入位置名称来搜索餐馆。

Answer 1

数据库层解决方案：

要处理此问题，您可以使用soundex API。这些是所有技术中可用的标准API，如小型库。它们也可用于数据库SQL查询。

以下是MySQL数据库的有效SQL查询之一：从restuarant中选择不同的r_name，其中area =＆＃39; South＆＃39;和SOUNDEX（费用）= SOUNDEX（＆＃39; cheep＆＃39;）

在上面的例子中，数据库可能有“廉价”的条目，但是用户已经输入了“cheep＆＃39;”。因此，上述查询将返回有效成本=＆＃39; cheap＆＃39;。

的有效记录

python层中的解决方案：

模糊库有Soundex和DMetaphone API。

设置模糊的步骤：

一个。确保安装了python3并在PATH中设置＆＃39; C：\ Program Files \ Python36 \ Scripts＆＃39;

湾从https://pypi.python.org/pypi/Fuzzy
下载Fuzzy-1.2.2.tar.gz库
℃。将它们提取到一个文件夹中。

d。执行setup.py install
在python中导入和测试：

导入模糊

dmtfn = fuzzy.DMetaphone（4）

print（dmtfn（＆＃39; Hyderaabaad＆＃39;），dmtfn（＆＃39; Hyderabad＆＃39;））
```
>> [b'HTRP', None] [b'HTRP', None]
```
print（dmtfn（＆＃39; Hyderaabaad＆＃39;）[0]，dmtfn（＆＃39; Hyderabad＆＃39;）[0]）
```
>> b'HTRP' b'HTRP'
```

一个真实的用例（聊天机器人中的实体提取器）：

当您为餐馆搜索构建聊天机器人时，您必须找到一个有效的位置，该位置被预定义为实体列表。因此，在将用户输入位置传递给数据库之前，应将其识别为python层中的实体。在这种情况下，我们可以使用soundex ot dmetaphone api。

在代码段下面，从文件夹中读取实体（所有位置都可以在cities.txt文件中），然后创建有效的实体列表。然后将实体列表转换为有效的DMetaphone代码。最后，输入位置将转换为DMetaphone代码，并与之前创建的代码进行比较。

    # read all entities from the entities folder
    # store them as dictionary, where key is filename
    files = os.listdir('./entities/')
    entities = {}
    for fil in files:
        lines = open('./entities/'+fil).readlines()
        for i, line in enumerate(lines):
            lines[i] = line[:-1]
        entities[fil[:-4]] = '|'.join(lines)

    # now convert the valid entities into codes
    if ' ' in uinput:
        codes = [dmtfn(w)[0] for w in uinput.lower().split()]
    else:
        codes = [dmtfn(uinput.lower())[0]]

    # If the code of input location matches with valid code list
    # then store the location as valid attribute for the intent
    for entity in entities:
        for i in entities[entity].split('|'):
            # entity extraction using sound code, to avoid spell mistakes
            # using soundex in database layer
            currCode = dmtfn(i.lower())[0]
            # print(currCode, i.lower())
            if currCode in codes:
                # if i.lower() in uinput.lower():
                attributes[entity] = i

处理python应用程序中用户定义名称的拼写错误

1 个答案: