Question

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
url = input('Enter -')
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html,'html.parser')

tags = soup('a')
for tag in tags:
    print(tag.get('herf',None))

我使用此链接测试了我的代码http://www.dr-chuck.com/page1.htm

输出为：NONE

输出应该是此链接http://www.dr-chuck.com/page2.htm

Answer 1

简单的错字，在那里。

在tags.get

中将'herf'更改为'href'

  import urllib.request, urllib.parse, urllib.error
    from bs4 import BeautifulSoup
    url = input('Enter -')
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html,'html.parser')

    tags = soup('a')
    for tag in tags:
        print(tag.get('href',None))

输出

#http://www.dr-chuck.com/page2.htm

使用美丽汤来抓取网页

1 个答案: