抓取天使列表时不支持或无效的CSS选择器:“ _ a”

时间:2019-01-08 06:29:03

标签: python web-scraping beautifulsoup python-requests

我一直在努力从天使主义者的csv文件中抓取公司数据,我想获得创始人的拳头名称,姓氏和角色名称。为此,我使用带有精美汤的请求。我认为我对汤选择有些问题。

这是当前的嵌套类树

std::vector

这是示例页面URL https://angel.co/dealflicks

-founders section
--section with_filler with_editable_regions dsss17 startups-show-sections ffs70 founders _a _jm
---dsr31 startup_roles fsp87 startup_profile_group _a _jm
----ul.larger roles
-----li.role
------<<dynamic div>>
-------g-lockup top larger
--------photo
--------text
---------name
---------role_title
---------bio

这让我犯了这个错误

import requests
from bs4 import BeautifulSoup, element
req = requests.get('https://angel.co/dealflicks', headers={'User-Agent': 'Mozilla/5.0'})
print(req.status_code)
soup = BeautifulSoup(req.text,"lxml")

founders = soup.select('.founders section .section with_filler with_editable_regions dsss17 startups-show-sections ffs70 founders _a _jm .dsr31 startup_roles fsp87 startup_profile_group _a _jm .larger roles role')

print (founders)

1 个答案:

答案 0 :(得分:0)

这是因为_a是类而不是标签名或<_a>,因此您需要在类的值前加上点.或将其与findAll()一起使用< / p>

soup.findAll('div', class='section with_filler with_editable_regions dsss17 startups-show-sections ffs70 founders _a _jm')

但是您只需要这个简单的选择器

founders = soup.select('ul.larger.roles li')
print (founders)

或从Web开发人员工具的元素面板中复制选择器

founders = soup.select('#root > div.page.flush_bottom.dl85.layouts.fhr17.header._a._jm > div > div.content.s-grid.s-grid--outer.u-maxWidthLayout.s-vgBottom2 > div > div.s-flexgrid0.s-flexgrid0--fixed.panes_grid > div.main.pane.s-flexgrid0--footer.s-flexgrid0-colMdW.s-vgPadRight1 > div > div.founders.section > div > div > ul > li')