我有一个python脚本,可通过运行以下命令从给定作者那里下载所有readreads的引文:goodreadsquotes.py
https://www.goodreads.com/author/quotes/1791.Seth_Godin> godin
但是,由于我是使用Python的初学者,因此我在执行它时遇到了问题。目前,我有2个错误。代码如下:
from pyquery import PyQuery
import sys, random, re, time
AUTHOR_REX = re.compile('\d+\.(\w+)$')
def grabber(base_url, i=1):
url = base_url + "?page=" + str(i)
page = PyQuery(url)
quotes = page(".quoteText")
auth_match = re.search(AUTHOR_REX, base_url)
if auth_match:
author = re.sub('_', ' ', auth_match.group(1))
else:
author = False
# sys.stderr.write(url + "\n")
for quote in quotes.items():
quote = quote.remove('script').text().encode('ascii', 'ignore')
if author:
quote = quote.replace(author, " -- " + author)
print (quote)
print ('%')
if not page('.next_page').hasClass('disabled'):
time.sleep(10)
grabber(base_url, i + 1)
if __name__ == "__main__":
grabber(''.join(sys.argv[1:]))
执行后:
py goodreadsquotes.py https://www.goodreads.com/author/quotes/1791.Seth_Godin > godin
错误如下:
Traceback (most recent call last):
File "goodreadsquotes.py", line 43, in <module>
grabber(''.join(sys.argv[1:]))
File "goodreadsquotes.py", line 34, in grabber
quote = quote.replace(author, " -- " + author)
TypeError: a bytes-like object is required, not 'str'
答案 0 :(得分:0)
从您发布的屏幕快照中,使用python中的encode()
方法返回一个bytes
对象,因此现在quote
不再是字符串,它是bytes
对象。因此,在replace()
上调用quote
需要bytes
中的两个参数,而不是str
中的两个参数。您可以将author
和"--"+author
转换为bytes
,如下所示:(第34行)
author_bytes = bytes(author, 'ascii')
replace_string_bytes = bytes("--"+author, 'ascii')
#converted author and the replacement string both to bytes
if author_bytes:
quote = quote.replace(author_bytes, replace_string_bytes)