在python中修改语料库

时间:2015-09-17 18:11:47

标签: python python-2.7

我有一个数据集(客户评论语料库),如下所示:

documents = 
[["I like the product", "5"],["the product is poor", "2.5"],["it is an okay product", "3"],["the quality is poor", "1"],["color is great", "3.5"]]

第一个列表值是我想根据第二个值(即得分)修改的语料库。得分可以是1(最低)和5(最高)之间的任何数字。我想要的是插入" GOOD"到语料库,如果它的分数大于3,并且单词" BAD"如果分数小于3,则为语料库。因此输出应如下所示:

[["I like the product GOOD", "5"],["the product is poor BAD", "2.5"],["it is an okay product", "3"],["the quality is poor BAD", "1"],["color is great GOOD", "3.5"]]

我开发了一个导致' str'对象没有属性' insert'

for document in documents:
    if int(float(document[1])) > 3:
        document[0].insert('GOOD')
    elif int(float(document[1])) < 3:
        document[0].insert('BAD')
    else:
        document[0].insert()

有什么建议吗?提前谢谢。

4 个答案:

答案 0 :(得分:2)

是的,str个对象没有insert个方法。

只需添加它:

document[0] += ' GOOD'

答案 1 :(得分:1)

您可以将列表理解结构与条件(三元)一起使用:

docs = [[doc[0] + (" GOOD" if float(doc[1]) > 3 
                 else (" BAD" if float(doc[1]) < 3 else ""))] 
        for doc in documents]

>>> docs
[['I like the product GOOD'],
 ['the product is poor BAD'],
 ['it is an okay product'],
 ['the quality is poor BAD'],
 ['color is great GOOD']]

答案 2 :(得分:1)

除了字符串不可变并且没有插入方法之外,你的其他内容是多余的,字符串只能是&gt; &LT;或==如果前两个是假的,则意味着它必须相等,因此不应该对它进行任何操作:

for doc in documents:
    f = int(float(doc[1]))
    if f > 3:
        doc[0] += " GOOD"
    elif f < 3:
        doc[0] += " BAD"
 print(documents)

[['I like the product GOOD', '5'], ['the product is poor BAD', '2.5'],
 ['it is an okay product', '3'], ['the quality is poor BAD', '1'], 
['color is great', '3.5']]

答案 3 :(得分:0)

这可以通过列表理解来实现

documents = [['I like the product', '5'],['the product is poor', '2.5'],['it is an okay product', '3'],['the quality is poor', '1'],['color is great', '3.5']]

documents = [[x[0] + (' GOOD' if float(x[1]) > 3 else ' BAD' if float(x[1]) < 3 else ''), x[1]] for x in documents]