Question

我正在抓取一个Google页面，该页面已返回LinkedIn个人资料的链接。

我想收集页面上的链接，并将它们放在python列表中。

问题是我似乎无法从页面中正确提取它们，而且我也不知道为什么。

Google源代码如下：

页面显示以下10个：

Mary Smith - Director of Talent Acquisition ...
https://www.linkedin.com › marysmith
Anytown, Arizona 500+ connections ... Experienced Talent Acquisition Director, with a 
demonstrated history of working in the marketing and advertising ...

源代码如下：

<div data-hveid="CAIQAA" data-ved="2ahUKEwjLv6HMr4HmAhWluVkKHfjfA1EQFSgAMAF6BAgCEAA">
   <div class="rc"> 
       <div class="r">
           <a href="https://www.linkedin.com/in/marysmith" ping="/url?sa=t&amp;source=web&amp;rct=j&amp;url=https://www.linkedin.com/in/marysmith&amp;ved=2ahUKEwjLv6HMr4HmAhWluVkKHfjfA1EQFjABegQIAhAB">
               <h3 class="LC20lb"><span class="S3Uucc">Mary Smith - Director of Talent Acquisition, Culture Curator ...</span></h3><br>
               <div class="TbwUpd">
                   <cite class="iUh30 bc">https://www.linkedin.com › marysmith</cite>
              </div>
           </a>
           ...

在我的脚本中，我使用Selenium和find_element_by_class_name()来收集指向Linkedin的链接的所有实例。上例中的一个是https://www.linkedin.com › marysmith。这是一行代码，其中我使用带有特定类名的driver.find_element_by_class_name()：

linkedin_urls = driver.find_element_by_class_name("iUh30 bc")

但是我遇到以下错误：

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[name="iUh30 bc"]"}

我尝试了各种排列和其他类，但是它不起作用。如果我将X_Path用于这些链接之一，则脚本将返回该单个链接。

我在做什么错了？

Answer 1

Google和Facebook等网站使用AI来构建页面源并分配随机类，这就是为什么您没有这样的原因元素，因为每次加载该页面时，类的值都会变化要解决此问题，请尝试使用常量标签或属性。

尝试类似的东西：

#<cite class="iUh30 bc">https://www.linkedin.com › mary-smith-mckenzie-8b660799</cite>
driver.find_elements_by_xpath("//cite[contains(text(),'›') and contains(text(),'linkedin.com')]")

Answer 2

该方法已知有故障。试试：

driver.find_element_by_css_selector(".iUh30.bc")

Python Selenium find_elements_by_class_name错误

2 个答案: