有没有办法计算BeautifulSoup中抓取的结果数量?
这是代码。
def crawl_first_url(max_page):
page = 1
while page <= max_page:
url = 'http://www.hdwallpapers.in/page/' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
for div in soup.select('.thumb a'):
href = 'http://www.hdwallpapers.in' + div.get('href')
crawl_second_url(href)
page += 1
def crawl_second_url(second_href):
#need to count the number of results here.
#I tried, len(second_href) but it doesn't work well.
crawl_first_url(1)
我希望第二个函数计算已爬网结果的数量,例如,如果已经抓取了19个网址,我想要它的数量。
答案 0 :(得分:1)
由于你只想计算结果的数量,我没有理由有一个单独的功能,只需添加一个计数器。
page = 1
numResults = 0
while page <= max_page:
url = 'http://www.hdwallpapers.in/page/' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
for div in soup.select('.thumb a'):
href = 'http://www.hdwallpapers.in' + div.get('href')
numResults += 1
page += 1
print("There are " + numResults + " results.")
这只会计算子页数。如果您还想计算顶级页面,只需在汤后添加另一个增量行。您可能还想添加try: except:
块以避免崩溃。