我想从UL获得所有带有类列表章的链接,但是我只得到想要的链接的一半,因为链接被分隔为两个{div}内,例如<div><ul>links1</ul><ul>links2</ul></div>
。我是python的新手,我真的很困。
如果可能的话,我想在每个链接之前添加“ http://www.example.com”,并将它们一个一个地保存在列表中,以便我可以使用list [1]访问它们。
谢谢,这是代码
# import libraries
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
"""Getting Started Example for Python 2.7+/3.3+"""
chapter = 1
chapterlist = 1
links = []
name = ""
reallink = ""
while chapter < 31:
quote_page = Request('http://website.com/page.html?page=' + str(chapter) + '&per-page=50', headers={'User-Agent': 'Mosezilla/5.0'})
page = urlopen(quote_page).read()
soup = BeautifulSoup(page, "html.parser")
name_box = soup.find("ul", attrs={"class": "list-chapter"})
links += name_box.find_all("a")
reallink += str([a['href'] for a in links])
chapter += 1
f = open("links.txt", "w+")
i = 1
f.write(reallink)
f.close()
答案 0 :(得分:0)
您使用的find
将返回第一个匹配项,而find_all
则将返回匹配项列表。
假设您的ul
类是正确的,我将改用select
并收集其中的子标签a
:
替换这些行:
name_box = soup.find("ul", attrs={"class": "list-chapter"})
links += name_box.find_all("a")
reallink += str([a['href'] for a in links])
使用
realinks = ['http://www.example.com' + item['href'] for item in soup.select('ul.list-chapter a')] #I'm assuming href already has leading /