用用户词典替换特定单词,用0替换其他单词

时间:2018-12-08 15:31:28

标签: python python-3.x pandas dictionary dataframe

所以我有一个评论数据集,其中包含诸如

的评论
  

最好。我去年买的。仍在使用。没问题   直到现在为止。惊人的电池寿命。在黑暗或广阔的环境下都能正常工作   日光。给任何书迷的最佳礼物。

(这是来自原始数据集的,我删除了所有标点符号,并在处理后的数据集中使用了所有小写字母)

我想做的是将一些单词替换为1(根据我的词典),而另一些替换为0。 我的字典是

dict = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}

我希望输出如下:

0010000000000001000000000100000

我使用了以下代码:

df['newreviews'] = df['reviews'].map(dict).fillna("0")

这总是返回0作为输出。我不想这样做,所以我将1和0作为字符串,但是尽管如此,我得到的结果还是一样。 有什么建议可以解决这个问题吗?

3 个答案:

答案 0 :(得分:1)

您可以这样做:

# clean the sentence
import re
sent = re.sub(r'\.','',sent)

# convert to list
sent = sent.lower().split()

# get values from dict using comprehension
new_sent = ''.join([str(1) if x in mydict else str(0) for x in sent])
print(new_sent)

'001100000000000000000000100000'

答案 1 :(得分:1)

首先不要使用script作为变量名,因为内置了(python保留字),然后将dictlist comprehension一起使用,将不匹配的值替换为get

通知

如果数据类似0-标点符号后不需要空格,请用空格代替。

date.Amazing

df = pd.DataFrame({'reviews':['Simply the best. I bought this last year. Still using. No problems faced till date.Amazing battery life. Works fine in darkness or broad daylight. Best gift for any book lover.']})

d = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}

df['reviews']  = df['reviews'].str.replace(r'[^\w\s]+', ' ').str.lower()

替代:

df['newreviews'] = [''.join(d.get(y, '0')  for y in x.split()) for x in df['reviews']]

df['newreviews'] =  df['reviews'].apply(lambda x: ''.join(d.get(y, '0')  for y in x.split()))

答案 2 :(得分:0)

您可以通过

df.replace(repl, regex=True, inplace=True)

其中df是您的数据帧,repl是您的字典。