我有一个数据集(客户评论语料库),如下所示:
documents =
[["I like the product", "5"],["the product is poor", "2.5"],["it is an okay product", "3"],["the quality is poor", "1"],["color is great", "3.5"]]
第一个列表值是我想根据第二个值(即得分)修改的语料库。得分可以是1(最低)和5(最高)之间的任何数字。我想要的是插入" GOOD"到语料库,如果它的分数大于3,并且单词" BAD"如果分数小于3,则为语料库。因此输出应如下所示:
[["I like the product GOOD", "5"],["the product is poor BAD", "2.5"],["it is an okay product", "3"],["the quality is poor BAD", "1"],["color is great GOOD", "3.5"]]
我开发了一个导致' str'对象没有属性' insert'
for document in documents:
if int(float(document[1])) > 3:
document[0].insert('GOOD')
elif int(float(document[1])) < 3:
document[0].insert('BAD')
else:
document[0].insert()
有什么建议吗?提前谢谢。
答案 0 :(得分:2)
是的,str
个对象没有insert
个方法。
只需添加它:
document[0] += ' GOOD'
答案 1 :(得分:1)
您可以将列表理解结构与条件(三元)一起使用:
docs = [[doc[0] + (" GOOD" if float(doc[1]) > 3
else (" BAD" if float(doc[1]) < 3 else ""))]
for doc in documents]
>>> docs
[['I like the product GOOD'],
['the product is poor BAD'],
['it is an okay product'],
['the quality is poor BAD'],
['color is great GOOD']]
答案 2 :(得分:1)
除了字符串不可变并且没有插入方法之外,你的其他内容是多余的,字符串只能是&gt; &LT;或==如果前两个是假的,则意味着它必须相等,因此不应该对它进行任何操作:
for doc in documents:
f = int(float(doc[1]))
if f > 3:
doc[0] += " GOOD"
elif f < 3:
doc[0] += " BAD"
print(documents)
[['I like the product GOOD', '5'], ['the product is poor BAD', '2.5'],
['it is an okay product', '3'], ['the quality is poor BAD', '1'],
['color is great', '3.5']]
答案 3 :(得分:0)
这可以通过列表理解来实现
documents = [['I like the product', '5'],['the product is poor', '2.5'],['it is an okay product', '3'],['the quality is poor', '1'],['color is great', '3.5']]
documents = [[x[0] + (' GOOD' if float(x[1]) > 3 else ' BAD' if float(x[1]) < 3 else ''), x[1]] for x in documents]