BeautifulSoup似乎没有解析任何东西

时间:2017-11-16 18:08:48

标签: python

我一直试图通过让自己成为代理刮刀来学习BeautifulSoup,而且我遇到了一个问题。 BeautifulSoup似乎无法找到任何东西,当打印它解析的内容时,它向我显示:

<html>
 <head>
 </head>
 <body>
  <bound 0x7f977c9121d0="" <http.client.httpresponse="" at="" httpresponse.read="" method="" object="" of="">
&gt;
  </bound>
 </body>
</html>

我已经尝试更改我解析的网站和解析器本身(lxml,html.parser,html5lib),但似乎没有任何改变,无论我做什么,我得到完全相同的结果。这是我的代码,任何人都可以解释错误吗?

from bs4 import BeautifulSoup
import urllib
import html5lib

class Websites:

    def __init__(self):
        self.header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}

    def free_proxy_list(self):
        print("Connecting to free-proxy-list.net ...")

        url = "https://free-proxy-list.net"
        req = urllib.request.Request(url, None, self.header)
        content = urllib.request.urlopen(req).read
        soup = BeautifulSoup(str(content), "html5lib")

        print("Connected. Loading the page ...")

        print("Print page")
        print("")
        print(soup.prettify())

1 个答案:

答案 0 :(得分:0)

您正在调用urllib.request.urlopen(req).read,正确的语法是:urllib.request.urlopen(req).read()您也没有关闭连接,为您解决了这个问题。

打开连接的更好方法是使用with urllib.request.urlopen(url) as req:语法,因为这会为您关闭连接。

from bs4 import BeautifulSoup
import urllib
import html5lib

class Websites:

    def __init__(self):
        self.header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}

    def free_proxy_list(self):
        print("Connecting to free-proxy-list.net ...")

        url = "https://free-proxy-list.net"
        req = urllib.request.Request(url, None, self.header)
        content = urllib.request.urlopen(req)
        html = content.read()
        soup = BeautifulSoup(str(html), "html5lib")

        print("Connected. Loading the page ...")

        print("Print page")
        print("")
        print(soup.prettify())
        content.close()  # Important to close the connection

有关详细信息,请参阅:https://docs.python.org/3.0/library/urllib.request.html#examples