Question

所以我有一个评论数据集，其中包含诸如

的评论

最好。我去年买的。仍在使用。没问题直到现在为止。惊人的电池寿命。在黑暗或广阔的环境下都能正常工作日光。给任何书迷的最佳礼物。

（这是来自原始数据集的，我删除了所有标点符号，并在处理后的数据集中使用了所有小写字母）

我想做的是将一些单词替换为1（根据我的词典），而另一些替换为0。我的字典是

dict = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}

我希望输出如下：

0010000000000001000000000100000

我使用了以下代码：

df['newreviews'] = df['reviews'].map(dict).fillna("0")

这总是返回0作为输出。我不想这样做，所以我将1和0作为字符串，但是尽管如此，我得到的结果还是一样。有什么建议可以解决这个问题吗？

Answer 1

您可以这样做：

# clean the sentence
import re
sent = re.sub(r'\.','',sent)

# convert to list
sent = sent.lower().split()

# get values from dict using comprehension
new_sent = ''.join([str(1) if x in mydict else str(0) for x in sent])
print(new_sent)

'001100000000000000000000100000'

Answer 2

首先不要使用script作为变量名，因为内置了（python保留字），然后将dict与list comprehension一起使用，将不匹配的值替换为get。

通知：

如果数据类似0-标点符号后不需要空格，请用空格代替。

date.Amazing

df = pd.DataFrame({'reviews':['Simply the best. I bought this last year. Still using. No problems faced till date.Amazing battery life. Works fine in darkness or broad daylight. Best gift for any book lover.']})

d = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}

df['reviews']  = df['reviews'].str.replace(r'[^\w\s]+', ' ').str.lower()

替代：

df['newreviews'] = [''.join(d.get(y, '0')  for y in x.split()) for x in df['reviews']]

df['newreviews'] =  df['reviews'].apply(lambda x: ''.join(d.get(y, '0')  for y in x.split()))

Answer 3

您可以通过

df.replace(repl, regex=True, inplace=True)

其中df是您的数据帧，repl是您的字典。

用用户词典替换特定单词，用0替换其他单词

3 个答案: