Question

对于Selenium来说，这是非常新的东西，但是我在从该网站选择我想要的元素时遇到了麻烦。在这种情况下，我使用Chrome的“复制XPath工具”获得了x_path。基本上，我希望从网站中提取CID文本（在这种情况下为4004），但是我的代码似乎无法执行此操作。任何帮助将不胜感激！

我也尝试过使用CSS选择器方法，但是它返回相同的错误。

chrome_options = Options()  
chrome_options.add_argument("--headless")  
chrome_options.binary_location = '/Applications/Google Chrome   Canary.app/Contents/MacOS/Google Chrome Canary'

driver= webdriver.Chrome()

chem_name = "D008294"
url = "https://pubchem.ncbi.nlm.nih.gov/#query=" + chem_name
driver.get(url)  


elements = driver.find_elements_by_xpath('//*[@id="collection-results-container"]/div/div/div[2]/ul/li/div/div/div/div[2]/div[2]/div[2]/span/a/span/span')

driver.close()

print(elements.text)

到目前为止，这是我收到的错误：“列表”对象没有属性“文本”

Answer 1

这是您可以使用的xpath。

//span[.='Compound CID']//following-sibling::a/descendant::span[2]

您的脚本为什么不起作用：我的代码中有2个问题。

elements = driver.find_elements_by_xpath('//*[@id="collection-results-container"]/div/div/div[2]/ul/li/div/div/div/div[2]/div[2]/div[2]/span/a/span/span')

driver.close() # <== don't close the browser until you are done with all your steps on the browser or elements

print(elements.text) # <== you can not get text from list (python will through error here

如何解决：

CID =  driver.find_element_by_xpath("//span[.='Compound CID']//following-sibling::a/descendant::span[2]").text # <== returning the text using find_element (not find_elements)

driver.close()

print(CID) # <== now you can print `CID` though browser closed as the value already stored in variable.

Answer 2

函数driver.find_elements_by_xpath返回元素的列表。您应该循环获取每个元素的文本

赞：

for ele in print(elements.text):
    print(ele.text)

或者，如果您要匹配第一个元素，请改用driver.find_element_by_xpath函数。

Answer 3

使用xpath提供的chrome总是无法正常工作。首先，您必须知道如何编写xpath并验证其chrome控制台。

请参阅这些链接，以帮助您了解xpaths

https://www.guru99.com/xpath-selenium.html

https://www.w3schools.com/xml/xpath_syntax.asp

在这种情况下，首先找到包含文本CID的跨度，然后移至父跨度，向下移至子级a / span / span。像// span [contains（text（），'Compound CID'] / parent :: span / a / span / span。

而且您还需要找到返回单个元素并从中获取文本的元素。如果使用findelements，则它将返回元素列表，因此您需要循环并从这些元素获取文本。

Answer 4

xpath： // a [包含（@href，'compound'）] / span [@ class ='breakword'] / span

您可以使用“ href”作为属性引用，因为我注意到它对每个组件都有唯一的值。

示例： href =“ https://pubchem.ncbi.nlm.nih.gov/ 物质 / 53790330” href =“ https://pubchem.ncbi.nlm.nih.gov/ 化合物 / 4004”

使用xpath选择元素

4 个答案: