Question

我需要从此页面抓取电话号码，例如 - http://m.avito.ru/sankt-peterburg/muzykalnye_instrumenty/shesti_strunnoe_bandzho_stroy_gitarnyy_203671253

左侧蓝色按钮执行此ajax请求并返回不在innerText中的电话号码，但是在这样的单独text（）节点中

<a href="tel:895**49****" class="button-text action-link" title="Телефон продавца" rel="nofollow">
"8 9** **9-99-**"
</a>

我点击这个按钮。等待3-5秒并尝试获得数字但我不能像这样使用 .text

phone = driver.find_element_by_class_name('button-text')
print phone.text

它只返回一个空字符串当我尝试这样做时

print driver.find_element_by_xpath('/html/body/section/article/section[2]/ul/li[1]/a/text()')

或者

print driver.find_element_by_xpath('/html/body/section/article/section[2]/ul/li[1]/a/text()').text

它返回 InvalidSelectorException：消息：u'Error Message =＆gt; \'xpath表达式“/ html / body / section / article / section [2] / ul / li [1] / a / text（）”的结果是：[object Text]。它应该是一个元素。

Answer 1

前几天我遇到了类似的问题，发现如果元素不可见，text方法会返回一个空字符串。您可以使用javascript滚动到该元素。

driver.execute_script("arguments[0].scrollIntoView(true);", element)

注意：页面上有多个button-text类的元素。如果你想在列表中同时使用它们，你可以这样做：

phone = driver.find_elements_by_class_name('button-text')
phonenums = []
for p in phone:
    p.click()
    driver.execute_script("arguments[0].scrollIntoView(true);", p)
    phonenums.append(p.text)

然而，如果你只是想抓这样的网页，我会尝试一种完全不同的方法。该页面似乎没有发出ajax请求，因此您应该能够简单地使用requests库获取源并解析它。如果您确实需要/想要使用Selenium，我会让它获取源代码（source = driver.page_source）并使用lxml解析它。

我还应该注意，您的最终错误是由在文本节点上调用text方法引起的。我非常确定Selenium无论如何都无法检索/text() xpath的文本。

Answer 2

为xpath尝试此规则：

print driver.find_element_by_xpath('//a[contains(@class, "button-text action-link")]/text()')

演示：

In [3]: print sel.xpath('//a[contains(@class, "button-text action-link")]/text()').extract()[0]
Показать номер

通过selenium + python从奇怪的html中获取文本节点

2 个答案: