将函数分配给字典值时出错

时间:2018-08-03 12:20:19

标签: string python-3.x function dictionary text

我正在尝试使用以下命令将函数分配给我的字典值:

x_text = [clean_str(v) for k, v in answer.items()]

函数clean_str:

def clean_str(string):
    # remove stopwords
    # string = ' '.join([word for word in string.split() if word not in cachedStopWords])
    string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string)
    string = re.sub(r"\'s", " \'s", string)
    string = re.sub(r"\'ve", " \'ve", string)
    string = re.sub(r"n\'t", " n\'t", string)
    string = re.sub(r"\'re", " \'re", string)
    string = re.sub(r"\'d", " \'d", string)
    string = re.sub(r"\'ll", " \'ll", string)
    string = re.sub(r",", " , ", string)
    string = re.sub(r"!", " ! ", string)
    string = re.sub(r"\(", " \( ", string)
    string = re.sub(r"\)", " \) ", string)
    string = re.sub(r"\?", " \? ", string)
    string = re.sub(r"\s{2,}", " ", string)
    return string.strip().lower()

但是我遇到以下错误:

  

文件“ C:\ ProgramData \ Anaconda3 \ lib \ re.py”,第191行,在子目录中       返回_compile(pattern,flags).sub(repl,string,count)

     

TypeError:预期的字符串或类似字节的对象

下面是我的字典(answer {})的前2 k,v对的摘录:

In[45]:{k: answer[k] for k in list(answer)[:2]}
Out[45]: 
{b'B00308CJ12': [b'Bulletproof Salesman (2008)'],
 b'189138922X': [b'Classical Mechanics']} 

1 个答案:

答案 0 :(得分:0)

字典的值全是字节,而不是字符串,并且re.sub仅能处理字符串。

您应该使用decode()方法将字节转换为字符串:

x_text = [clean_str(i.decode()) for k, v in answer.items() for i in v]