使用字典对句子进行分类

时间:2018-09-22 13:29:15

标签: python text categorization

我正在使用以下功能对主题中的句子进行分类

def theme(x):
    output =[]
    category = ()
    for i in x:
        if 'AC' in i:
            category = 'AC problem'
        elif 'insects' in i:
            category = 'Cleanliness'
        elif 'clean' in i:
            category = 'Cleanliness'
        elif 'food' in i:
            category = 'Food Problem'
        elif 'delay' in i:
            category = 'Train Delayed'
        else:
            category = 'None'
        output.append(category)
    return output

我不想对类别中的每个单词使用重复的if语句。相反,我想给我一个清单/字典,例如Cleanliness = ['Clean', 'Cleaned', 'spoilt', 'dirty']用于针对句子中包含任何单词的句子获取“清洁度”类别。我该怎么办

4 个答案:

答案 0 :(得分:1)

您可以使用集合的字典来按类别组织单词,然后根据所述结构生成单词到类别的查找字典:

categories = {
    'Cleanliness': {'insects', 'clean'},
    'AC Problem': {'AC'},
    'Food Problem': {'food'},
    'Train Delayed': {'delay'}
}
lookup = {word: category for category, words in categories.items() for word in words}
def theme(x):
    return {lookup.get(word, 'None') for word in x}

以便theme(['AC', 'clean', 'insects'])将返回一组相应的类别:

{'Cleanliness', 'AC Problem'}

答案 1 :(得分:1)

这应该满足您的要求。我将所有键设置为小写字母,并在检查是否找到匹配项时将i转换为小写字母,但是大写不同,它仍然很重要。

def theme(x):
output =[]
category = ()

myDict = {"ac":"AC problem", "insects":"Cleanliness", "clean":"Cleanliness", "food":"Food Problem", "delay":"Train Delayed"} #I reccomend coming up with a more suitable name for your dictionary in your actual program

for i in x:
    if i.lower() in myDict: #Checks to see if i is in the dictionary before trying to print the result; prevents possible Key Errors
        category = (myDict[i.lower()]) #If it is in the dictionary it category will be set to the result of the key

        output.append(category)

    else:
        output.append("None") #If i isn't in the dictionary output will append None instead
return output

以下是一些示例:

>>>print(theme(['Clean', 'Cleaned', 'spoilt', 'dirty']))
['Cleanliness', 'None', 'None', 'None']

>>>print(theme(['Delay', 'Ham', 'Cheese', 'Insects']))
['Train Delayed', 'None', 'None', 'Cleanliness']

答案 2 :(得分:0)

我想出了另一种方法:

def theme(x):
output = []
for i in x:
    if set(cleanliness).intersection(i.lower().split()):
        category = 'clean'
    elif set(ac_problem).intersection(i.lower().split()):
        category = 'ac problem'
    else:
        category = 'none'
    output.append(category)
return output

答案 3 :(得分:-1)

也许您可以这样:

def theme(x):
    output = []
    name_dic = {"AC": "AC problem",
                "clean": "Cleanliness",
                "food": "Food Problem"
                }
    for e in x:
        output.append(name_dic.get(e))

    return output

或更确切地说是这样:

def theme(x):
    output = []
    name_list = [
        ("AC", "AC problem"),
        ("clean", "Cleanliness"),
        ("insects", "Cleanliness"),
        ("food", "Food Problem")
    ]
    name_dic = dict(name_list)
    for e in x:
        output.append(name_dic.get(e))

    return output

希望有帮助。