使用Beautifulsoup的Findall <li>

时间:2019-08-11 13:56:49

标签: python web-scraping beautifulsoup

我希望使用beautifulsoup提取项目数据的“ def”部分:

<div
   <ul
      <li class : "abc" project-data: "def">
      <li class : "abc" project-data: "ghi">

我尝试过:

soup = BeautifulSoup(driver.page_source,"html.parser")
data = soup.find('li', {'data-project': ''}).text
print(data)

有人知道如何检索这些数据吗?

2 个答案:

答案 0 :(得分:1)

假设您的HTML大致是这样的:

<div>
 <ul>
     <li class = "abc" project-data= "def"></li>
     <li class = "abc" project-data= "ghi"></li>
   </ul>
</div>

要做:

vals = soup.find_all("li")

for val in vals:
    print(val.attrs['project-data'])

输出:

def
ghi

答案 1 :(得分:1)

您可以使用CSS选择器li[project-data]。这将找到所有包含属性<li>的{​​{1}}标签。在project-data=中使用方法bs4select()调用CSS选择器:

select_one()

打印:

from bs4 import BeautifulSoup

data = '''<div>
   <ul>
      <li class="abc" project-data="def">
      <li class="abc" project-data="ghi">'''

soup = BeautifulSoup(data, 'lxml')

for li in soup.select('li[project-data]'):
    print(li['project-data'])

有关CSS selectors here的更多信息。