Question

我有这段代码：

import urllib
import urlparse
from bs4 import BeautifulSoup

url = "http://www.downloadcrew.com/?act=search&cat=51"
pageHtml = urllib.urlopen(url)
soup = BeautifulSoup(pageHtml)

for a in soup.select("div.productListingTitle a[href]"):
    try:
        print (a["href"]).encode("utf-8","replace")
    except:
        print "no link"

        pass

但是当我运行它时，我只能获得20个链接。输出应该超过20个链接。

Answer 1

因为您只下载了内容的第一页。

只需使用循环即可下载所有页面：

import urllib
import urlparse
from bs4 import BeautifulSoup

for i in xrange(3):
    url = "http://www.downloadcrew.com/?act=search&page=%d&cat=51" % i
    pageHtml = urllib.urlopen(url)
    soup = BeautifulSoup(pageHtml)

    for a in soup.select("div.productListingTitle a[href]"):
        try:
            print (a["href"]).encode("utf-8","replace")
        except:
            print "no link"

如果你不知道页数，你可以

import urllib
import urlparse
from bs4 import BeautifulSoup

i = 0
while 1:
    url = "http://www.downloadcrew.com/?act=search&page=%d&cat=51" % i
    pageHtml = urllib.urlopen(url)
    soup = BeautifulSoup(pageHtml)

    has_more = 0
    for a in soup.select("div.productListingTitle a[href]"):
        has_more = 1
        try:
            print (a["href"]).encode("utf-8","replace")
        except:
            print "no link"
    if has_more:
        i += 1
    else:
        break

我在我的电脑上运行它，得到60页的三页链接。祝你好运〜

我怎样才能获得所有软件链接？

1 个答案: