我正在试图抓(https://en.wikiquote.org/wiki/Remember_the_Titans#Coach_Boone),我希望得到所有部分的引用,但是对话,标语和外部链接。我可以去ul > li
然后它取出一切。如何在以下html之后获取ul > li
:
<h2><span class="mw-headline" id="Coach_Boone">Coach Boone</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Remember_the_Titans&action=edit&section=1" title="Edit section: Coach Boone">edit</a><span class="mw-editsection-bracket">]</span></span></h2>
答案 0 :(得分:2)
找到h2
元素后,使用.find_next_siblings()
方法获取以下ul
兄弟元素:
h2 = soup.find("span", id="Coach_Boone").find_parent('h2')
for ul in h2.find_next_siblings("ul"):
for li in ul.find_all("li"):
print(li)