Question

我一直试图通过让自己成为代理刮刀来学习BeautifulSoup，而且我遇到了一个问题。 BeautifulSoup似乎无法找到任何东西，当打印它解析的内容时，它向我显示：

<html>
 <head>
 </head>
 <body>
  <bound 0x7f977c9121d0="" <http.client.httpresponse="" at="" httpresponse.read="" method="" object="" of="">
&gt;
  </bound>
 </body>
</html>

我已经尝试更改我解析的网站和解析器本身（lxml，html.parser，html5lib），但似乎没有任何改变，无论我做什么，我得到完全相同的结果。这是我的代码，任何人都可以解释错误吗？

from bs4 import BeautifulSoup
import urllib
import html5lib

class Websites:

    def __init__(self):
        self.header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}

    def free_proxy_list(self):
        print("Connecting to free-proxy-list.net ...")

        url = "https://free-proxy-list.net"
        req = urllib.request.Request(url, None, self.header)
        content = urllib.request.urlopen(req).read
        soup = BeautifulSoup(str(content), "html5lib")

        print("Connected. Loading the page ...")

        print("Print page")
        print("")
        print(soup.prettify())

Answer 1

您正在调用urllib.request.urlopen(req).read，正确的语法是：urllib.request.urlopen(req).read()您也没有关闭连接，为您解决了这个问题。

打开连接的更好方法是使用with urllib.request.urlopen(url) as req：语法，因为这会为您关闭连接。

from bs4 import BeautifulSoup
import urllib
import html5lib

class Websites:

    def __init__(self):
        self.header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}

    def free_proxy_list(self):
        print("Connecting to free-proxy-list.net ...")

        url = "https://free-proxy-list.net"
        req = urllib.request.Request(url, None, self.header)
        content = urllib.request.urlopen(req)
        html = content.read()
        soup = BeautifulSoup(str(html), "html5lib")

        print("Connected. Loading the page ...")

        print("Print page")
        print("")
        print(soup.prettify())
        content.close()  # Important to close the connection

有关详细信息，请参阅：https://docs.python.org/3.0/library/urllib.request.html#examples

BeautifulSoup似乎没有解析任何东西

1 个答案: