应用错误收集

我有这个链接：http://www.brothersoft.com/windows/categories.html

我想获取下载链接，应用程序名称，应用程序链接，发布者，和最后更新日期。

输出应该是这样的：

http://www.brothersoft.com/windows/mp3_audio/midi_tools/

我试过这段代码：

import urllib
from bs4 import BeautifulSoup

url = 'http://www.brothersoft.com/windows/categories.html'

pageHtml = urllib.urlopen(url).read()

soup = BeautifulSoup(pageHtml)

sAll = [div.find("a") for div in soup.findAll("div", attrs={"class":"brLeft"})]

for i in sAll:
    print i

但它不包括所有。

请帮帮我。

如何编写元数据爬虫脚本？

0 个答案: