Question

我尝试使用Python和Selenium来抓取网页上的多个链接。我使用find_elements_by_xpath并且我能够找到元素列表，但我在更改返回到实际href链接的列表时遇到问题。我知道find_element_by_xpath有效，但这只适用于一个元素。

这是我的代码：

path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)

browser.get("file:///path to html file")

all_trails = []

#finds all elements with the class 'text-truncate trail-name' then 
#retrieve the a element
#this seems to be just giving us the element location but not the 
#actual location

find_href = browser.find_elements_by_xpath('//div[@class="text truncate trail-name"]/a[1]')
all_trails.append(find_href)

print all_trails

此代码返回：

<selenium.webdriver.remote.webelement.WebElement 
(session="dd178d79c66b747696c5d3750ea8cb17", 
element="0.5700549730549636-1663")>, 
<selenium.webdriver.remote.webelement.WebElement 
(session="dd178d79c66b747696c5d3750ea8cb17", 
element="0.5700549730549636-1664")>,

我希望all_trails数组是一个链接列表，例如：www.google.com, www.yahoo.com, www.bing.com。

我尝试在all_trails列表中循环并在列表上运行get_attribute('href')方法，但是我收到错误：

有没有人知道如何将selenium WebElement转换为href链接？

非常感谢任何帮助：）

Answer 1

如果您有以下HTML：

<div class="text-truncate trail-name">
<a href="http://google.com">Link 1</a>
</div>
<div class="text-truncate trail-name">
<a href="http://google.com">Link 2</a>
</div>
<div class="text-truncate trail-name">
<a href="http://google.com">Link 3</a>
</div>
<div class="text-truncate trail-name">
<a href="http://google.com">Link 4</a>
</div>

您的代码应如下所示：

all_trails = []

all_links = browser.find_elements_by_css_selector(".text-truncate.trail-name>a")

for link in all_links:

    all_trails.append(link.get_attribute("href"))

其中all_trails - 是链接列表（链接1，链接2等）。

希望它可以帮到你！

Answer 2

让我们看看您的代码中发生了什么：

如果对相关HTML没有任何可见性，则以下行似乎会返回WebElements List 中的两个find_href附加到 all_trails List：

find_href = browser.find_elements_by_xpath('//div[@class="text truncate trail-name"]/a[1]')

因此，当我们打印List all_trails 时，都会打印WebElements。因此没有错误。

根据您提供的错误快照，您尝试在不支持 get_attribute("href")上调用 List 方法。因此，您会看到错误：

'List' Object has no attribute 'get_attribute'

解决方案：

要获取 href 属性，我们必须按以下方式迭代List：

find_href = browser.find_elements_by_xpath('//your_xpath')
for my_href in find_href:
    print(my_href.get_attribute("href"))

Answer 3

talk()

find_href = browser.find_elements_by_xpath('//div[@class="text truncate trail-name"]/a[1]') for i in find_href: all_trails.append(i.get_attribute('href'))适用于该列表的元素，而不是列表本身。

Answer 4

以单数形式将其用作find_element_by_css_selector而不是使用find_elements_by_css_selector，因为它会在List中返回许多webElement。因此，您需要遍历每个webElement才能使用Attribute。

＆＃39;列表＆＃39;对象没有属性＆＃39; get_attribute＆＃39;在迭代WebElements时

4 个答案:

解决方案：