Question

在Python中使用Selenium。

我的页面中充满了名为item-title的类的链接。我正在尝试遍历页面并编译所有链接文本和随附的href属性的列表。我想输出标题和链接到csv文件。这是我的代码：

myLinks=driver.find_elements_by_class_name("item-title")
for link in myLinks:
    out.write(link.text)
    out.write (",") 
    out.write(link.get_attribute("href"))
    out.write("\n")

输出href值的行会出现以下错误：

TypeError：期望一个字符缓冲区对象

尝试以下方法：

myLinks=driver.find_elements_by_class_name("item-title")
for link in myLinks:
    out.write(link.text)
    out.write (",") 
    out.write(str(link.get_attribute("href")))
    out.write("\n")

错误消失了，链接文本正常通过，但现在href正在以'无'的形式出现

编辑以添加HTML

<div class="item-title">
    <span class="icons-pinned"></span>
    <span class="icons-solved"></span>
    <span class="icons-locked"></span>
    <span class="icons-moved"></span>
    <span class="icons-type"></span>
    <span class="icons-reply"></span>
    <a href="/mylink">My title</a>
</div>

我想我现在看到了这个问题。这是div的子元素，我需要针对那个，不是吗？

Answer 1

根据您分享的HTML，link texts和href attributes不在标识为 find_elements_by_class_name("item-title") 的节点内。相反，它们位于后代的 <a> 标记内。因此，我们不必使用 find_elements_by_class_name("item-title") ，而是使用 find_elements_by_xpath 或 find_elements_by_css_selector ，如下所示：< / p>

使用 find_elements_by_css_selector ：

myLinks=driver.find_elements_by_css_selector("div.item-title > a")
for link in myLinks:
    out.write(link.get_attribute("innerHTML"))
    out.write (",") 
    out.write(link.get_attribute("href"))
    out.write("\n")

使用 find_elements_by_xpath ：

myLinks=driver.find_elements_by_xpath("//div[@class='item-title']/a")
for link in myLinks:
    out.write(link.get_attribute("innerHTML"))
    out.write (",") 
    out.write(link.get_attribute("href"))
    out.write("\n")

使用Python在Selenium中使用find_elements_by_class_name查找href属性

1 个答案: