Question

这是交易，我有一些从IMDB网站上取出的字幕，但是sinopse元素在一个没有任何东西可以追踪的盒子里（如下图所示），我是这样尝试的，但我不是＆＃39 ;知道如何提取文本，get_attribute或类似的东西。

那是工作版本

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('http://www.imdb.com/title/tt2731500/')

alo = driver.find_element(By.XPATH, '//div[@itemprop="description"]').text
print(alo)

The image of the HTML

Answer 1

1）将html保存到磁盘

2）获取xmllint之类的xml工具和一些关于XPath的好教程。

3）测试并调试XPath，直到找到解决方案。

4）如果您对XPath有具体问题，请在此处询问。

让我们说你感兴趣的元素是

<div class="summary_text" itemprop="description">
        This documentary highlights the role of education professionals in the Second Republic of Spain as the first independent and free women who broke with the female prototype of that era.
</div>

所以xpath可能是

//div[@itemprop="description"]

或页面的英文版

//div[@itemprop="description"]/p/text()

无法使用Xpath定位元素

1 个答案: