Question

我正在尝试抓取一个网站，我遇到的问题是页面需要时间来加载。所以当我的抓取完成时，我可能只有五个项目可能有25个。有没有办法减慢python。我正在使用beautifulSoup 这是我正在使用的代码

import urllib
import urllib.request
from bs4 import BeautifulSoup

theurl="http://agscompany.com/product-category/fittings/tube-nuts/316-tube/"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")

for pn in soup.find_all('div',{"class":"shop-item-text"}):
    pn2 = pn.text
    print(pn2)

谢谢

Answer 1

所有结果都可以从这些页面访问：

http://agscompany.com/product-category/fittings/tube-nuts/316-tube/page/ http://agscompany.com/product-category/fittings/tube-nuts/316-tube/page/2/ ...

因此，您可以通过页码上的循环访问它们：

import urllib
import urllib.request
from bs4 import BeautifulSoup

theurl="http://agscompany.com/product-category/fittings/tube-nuts/316-tube/"
for i in range(1,5):
  thepage = urllib.request.urlopen(theurl + '/page/' + str(i) + '/')
  soup = BeautifulSoup(thepage,"html.parser")

  for pn in soup.find_all('div',{"class":"shop-item-text"}):
      pn2 = pn.text
      print(pn2)

Answer 2

@ Kenavoz答案的更通用版本。

这种方法并不关心有多少页面。

另外，我会选择var test = { arr: [ {id: 1, a: false}, {id: 2, a: false}, {id: 3, a: true}, {id: 4, a: false}, {id: 5, a: false}, ], get active() { this.arr.forEach(item => { if(item.a) return item }) } }而不是requests。

urllib

使用python

2 个答案: