我正在编写python脚本,它从网站获取链接。但是当我尝试使用web page时,我无法获得链接。我的剧本是:
soup = BeautifulSoup(urllib2.urlopen(url))
datas = soup.findAll('div', attrs={'class':'tsrImg'})
for data in datas:
link = data.find('a')
print str(link.href)
它只打印无,任何人都可以解释为什么会这样???
答案 0 :(得分:5)
变化:
str(link.href)
使用:
link.get('href')
看起来像这样:
from BeautifulSoup import BeautifulSoup
import urllib2
url = 'http://www.meinpaket.de/de/shopsList.html?page=1'
soup = BeautifulSoup(urllib2.urlopen(url))
datas = soup.findAll('div', {'class':'tsrImg'})
for data in datas:
link = data.find('a')
print link.get('href')
输出:
/de/~-office-partner-gmbh-;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~-24selling-de;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~abalisi-kuenstlerbedarf-shop;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~abcmeineverpackung-de-kg;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~ability;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~ac-foto-handels-gmbh;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~ac-sat-corner-inh-dirk-hahn;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~adamo-fashion-gmbh-shop;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~adapter-markt;jsessionid=11957F27FC2D888A34532D9848C922FB.as03
/de/~adko;jsessionid=11957F27FC2D888A34532D9848C922FB.as03