尝试将书籍分成章节的python错误

时间:2019-03-07 02:58:10

标签: python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from urllib.request import urlopen

#Reading the text of novel from a website
huck_fin_url = 'http://www.gutenberg.org/files/76/76-0.txt'
df = urlopen(huck_fin_url)
huck_fin_text = df.read()
#print(huck_fin_text)
huck_fin_chapters = huck_fin_text.split('CHAPTER ')[1:]

错误

  

文件“ /Users/richxxxxx/Documents/ReadBooks.py”,第19行,在       huck_fin_chapters = huck_fin_text.split('CHAPTER')[1:]

     

TypeError:需要一个类似字节的对象,而不是'str'

2 个答案:

答案 0 :(得分:0)

from urllib.request import urlopen

huck_fin_url = 'http://www.gutenberg.org/files/76/76-0.txt'  
df = urlopen(huck_fin_url)  
huck_fin_text = str(df.read())
huck_fin_chapters = huck_fin_text.split('CHAPTER ')[1:]  
print(huck_fin_chapters)

您必须在df.read()的前面添加“ str”

答案 1 :(得分:0)

urlopen返回字节流而不是字符串,并且.split()对这些对象不可用。您首先需要根据正确的字符集对其进行解码:

from urllib.request import urlopen

#Reading the text of novel from a website
huck_fin_url = 'http://www.gutenberg.org/files/76/76-0.txt'
df = urlopen(huck_fin_url)
huck_fin_text = df.read().decode("utf8")
#print(huck_fin_text)
huck_fin_chapters = huck_fin_text.split('CHAPTER ')[1:]