Question

Python新手，有一个简单的情境问题：

尝试使用BeautifulSoup来解析一系列页面。

from bs4 import BeautifulSoup
import urllib.request

BeautifulSoup(urllib.request.urlopen('http://bit.ly/'))

追溯......

html.parser.HTMLParseError: expected name token at '<!=KN\x01...

使用Python 3.2在Windows 7 64位上工作。

我需要机械化吗？（这将需要Python 2.X）

Answer 1

如果该URL正确，您就会问为什么HTML解析器会在解析MP3文件时抛出错误。我相信这个问题的答案是不言而喻的......

Answer 2

如果您尝试下载该MP3，可以执行以下操作：

import urllib2

BLOCK_SIZE = 16 * 1024

req = urllib2.urlopen("http://bit.ly/xg7enD") 
#Make sure to write as a binary file
fp = open("someMP3.mp3", 'wb')
try:
  while True:
    data = req.read(BLOCK_SIZE)
    if not data: break
    fp.write(data)
finally:
  fp.close()

Answer 3

如果你想在python中下载文件，你也可以使用它

import urllib
urllib.urlretrieve("http://bit.ly/xg7enD","myfile.mp3")

它将使用“myfile.mp3”名称将您的文件保存在当前工作目录中。我可以通过它下载所有类型的文件。

希望它可能有所帮助！

Answer 4

而不是urllib.request我建议使用请求，并从这个lib使用get（）

from requests import get
from bs4 import BeautifulSoup

soup = BeautifulSoup(
       get(url="http://www.google.com").content, 
       'html.parser'
)

BeautifulSoup HTMLParseError

4 个答案: