Question

我是Python编码的新手，所以请对我很轻松。我有一个标记化的熊猫系列，看起来像这样：

reviews = [['bad', 'movie', 'it', 'was', 'turrible'],['bad', 'acting', 'in' 'it'], ['ok', 'experience'],...]

我有这样的字典：

d = {'turrible':'terrible', 'ok':'okay',...}

字典键中出现的评论中的任何单词都应替换为字典值。所以预期的输出是：

reviews = [['bad', 'movie', 'it', 'was', 'terrible'],['bad', 'acting', 'in', 'it'], ['okay', 'experience'],...]

我已经搜索了几个小时，我已经尝试了这些解决方案，但我没有得到预期的输出。

试验1：

pattern = re.compile(r'\b(' + '|'.join(d.keys()) + r')\b')
result = pattern.sub(lambda x: d[x.group()], reviews)

Output: error: incomplete escape \u

试验2：

def replaceWords(text,wdict):
return ''.join(wdict.get(word,word) for word in text)
replaceWords(docs,d)
Output: TypeError unhashable type: 'list'

试验3 - 没有错误信息，但没有得到预期的输出：

reviews = reviews.replace(d)

试验4：

reviews = reviews.replace(d, regex=True)
error: missing ), unterminated subpattern

任何帮助都将不胜感激。

编辑：更正了评论系列的结构

Answer 1

>>> reviews = ['bad' 'movie' 'it' 'was' 'terrible','bad' 'acting' 'in' 'it',
   ...:  'okay' 'experience']

>>> reviews
 ['badmovieitwasterrible', 'badactinginit', 'okayexperience']

可能不是你想要的。

使用字典

1 个答案: