Question

Python新手，有人可以解释下面代码中findAll("a")的含义吗？我可以用任何其他字母代替吗？比如g，h，m？ 'a'是否意味着在文章中找到“a”？

和href=re.compile("^(/wiki/)((?!:).)*$"))是否意味着找到名称中包含wiki的链接？

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://en.wikipedia.org/wiki/Kevin_Bacon")
bsObj = BeautifulSoup(html)
for link in bsObj.find("div", {"id":"bodyContent"}).findAll("a",
href=re.compile("^(/wiki/)((?!:).)*$")):
    if 'href' in link.attrs:
        print(link.attrs['href'])

有人可以推荐一些好书来学习python 3.6中的网页抓取，初学者可以轻松学习吗？

Answer 1

findAll("a")表示搜索所有“a”（锚）标记

是的，您可以使用'h'，'b'，'strong'和任何其他有效的html标记名代替'a'

您可以了解BeautifulSoup here

的更多信息

同样re.compile("^(/wiki/)((?!:).)*$"))将获得以wiki

开头的所有链接

在beautifulsoup python中查找所有（“a”）

1 个答案: