Python BeautifulSoup:有没有办法计算抓取的结果数?

时间:2015-12-22 15:35:26

标签: python python-3.x beautifulsoup

有没有办法计算BeautifulSoup中抓取的结果数量?

这是代码。

def crawl_first_url(max_page):
    page = 1

    while page <= max_page:
        url = 'http://www.hdwallpapers.in/page/' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text 
        soup = BeautifulSoup(plain_text, 'html.parser')

        for div in soup.select('.thumb a'):
            href = 'http://www.hdwallpapers.in' + div.get('href')
            crawl_second_url(href)
        page += 1

def crawl_second_url(second_href):
    #need to count the number of results here.
    #I tried, len(second_href) but it doesn't work well.

crawl_first_url(1)

我希望第二个函数计算已爬网结果的数量,例如,如果已经抓取了19个网址,我想要它的数量。

1 个答案:

答案 0 :(得分:1)

由于你只想计算结果的数量,我没有理由有一个单独的功能,只需添加一个计数器。

page = 1
numResults = 0

while page <= max_page:
    url = 'http://www.hdwallpapers.in/page/' + str(page)
    source_code = requests.get(url)
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, 'html.parser')

    for div in soup.select('.thumb a'):
        href = 'http://www.hdwallpapers.in' + div.get('href')
        numResults += 1
    page += 1

print("There are " + numResults + " results.")

这只会计算子页数。如果您还想计算顶级页面,只需在汤后添加另一个增量行。您可能还想添加try: except:块以避免崩溃。