带有str和字节码对象的Python3错误

时间:2018-12-11 16:21:52

标签: python string python-3.x byte nltk

所以我试图编写一个简单的函数来清理文本并进行总结:

def getTextWaPo(url):
page = urllib2.urlopen(url).read().decode('utf8')
soup = BeautifulSoup(page, "lxml")
text = ' '.join(map(lambda p: p.text, soup.find_all('article')))
return text.encode('ascii', errors='replace').replace("?"," ")

但是对于这段代码,我得到了这个错误:

  File "Autosummarizer.py", line 12, in getTextWaPo
  return text.encode('ascii', errors='replace').replace("?"," ")
  TypeError: a bytes-like object is required, not 'str'

  line 12 ==> text = getTextWaPo(articleURL)

我该怎么办?

2 个答案:

答案 0 :(得分:0)

您必须更改最后一行 return text.encode('ascii', errors='replace').replace("?"," ") return text.encode('ascii', errors='replace').replace(b"?", b" ") 因为在encode()之后,您正在bytes上进行操作,并且必须用其他字节替换字节。

答案 1 :(得分:0)

您正在使用第12行来编码数据,必须使用字节。为replace(b"?", b" ")

代码类似于

import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
def getTextWaPo(url):
    page = urlopen(url).read().decode('utf8')
    soup = BeautifulSoup(page, "lxml")
    text = ' '.join(map(lambda p: p.text, soup.find_all('article')))
    return text.encode('ascii', errors='replace').replace(b"?",b" ")
getTextWaPo("https://stackoverflow.com/")