Question

我尝试了这个，但是没有得到任何结果。该程序只需运行并以退出代码0结束。for循环之前的所有内容均正常运行（我使用print（）函数进行了检查）。

 from bs4 import BeautifulSoup
import requests

def webscrawling(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://webscraper.io/test-sites/e-commerce/allinone" + str(page)
        sourcecode = requests.get(url)
        plaintext = sourcecode.text
        soup = BeautifulSoup(plaintext, "html.parser")
        for link in soup.findAll('a', {'class' : 'title'}):
            show = link.get('href')
            print(show)
        page += 1

webscrawling(2)

Answer 1

我通过打印url的源代码来运行您的代码。这些是结果：

<Response [404]>
<Response [404]>

如您所见，我们无法得到任何回应。您也可以尝试从浏览器连接这些链接，否则会出现404错误。问题是因为没有这样的网页。

但是，除了URL，您的代码没有任何问题。例如，这是我编辑的您的代码。我在代码中评论了。

from bs4 import BeautifulSoup
import requests

def webscrawling(max_page):
        page = 1;
        while page <= max_page:
                url = "https://webscraper.io/test-sites/e-commerce/allinone" #this is a valid url if we remove the page
                sourcecode = requests.get(url)
                print(sourcecode) #i printed this for knowing the response from server (200 means OK)
                plaintext = sourcecode.text
                soup = BeautifulSoup(plaintext, "html.parser")
                for link in soup.findAll('a',{'class' : 'title'}):
                        show = link.get('href')
                        print(show)
                page+=1
webscrawling(1)

这是编辑后的代码的输出：

<Response [200]>
/test-sites/e-commerce/allinone/product/219
/test-sites/e-commerce/allinone/product/296
/test-sites/e-commerce/allinone/product/286

编辑：好的，网站存在。我们可以通过“ https://webscraper.io/test-sites/e-commerce/allinone”这个网站进行访问，没有问题。但是在您的代码中，您不会访问此站点。你要去别的地方。您要求“ https://webscraper.io/test-sites/e-commerce/allinone1”之间区别的程序是URL的最后一个字符。为了更好看：

https://webscraper.io/test-sites/e-commerce/allinone
https://webscraper.io/test-sites/e-commerce/allinone1

如您所见，这两个链接之间存在差异。您的程序正在以下行中初始化url：

url = "https://webscraper.io/test-sites/e-commerce/allinone"+str(page)

如您所见，URL末尾有str（page）。这是我们出现问题的原因。如果您从此行中删除+ str（page）

url = "https://webscraper.io/test-sites/e-commerce/allinone"

该网址将是正确的。

Answer 2

您正在查看的网站（https://webscraper.io/test-sites/e-commerce/allinone1）上没有<a class='title'>

为什么beautifulsoup没有显示任何结果？

2 个答案: