Question

目前，我正在使用selenium从网络中提取一些内容，如下所示：

driver = webdriver.Firefox()
driver.get('website.com')
links = driver.find_elements_by_xpath('''.//*[@id='section-1']//td[1]//a[2]''')
links = [x.get_attribute('href') for x in links]

lis = list()
print(lis)

然后我打印一个包含内容的嵌套列表：

[["Our culture has gotten too mean and too rough, especially to children and teenagers," she said. "It is never OK when a 12-year-old girl or boy is mocked, bullied or attacked" in the school yard, she argued, but it is "absolutely unacceptable when it is done by someone with no name hiding on the internet."], [Delivering a get-out-the-vote speech in the Philadelphia suburbs on Thursday, Melania Trump pledged to focus on combating online bullying and campaigning for women and children if her husband is elected to the White House.], ["We have to find a better way to talk to each other, to disagree with each other, to respect each other," she said.], [Thursday's speech was Melania Trump's first since she addressed the Republican National Convention in July. That speech was well-received initially, but was quickly overshadowed by the discovery that sections had been plagiarised from First Lady Michelle Obama's address to the 2008 Democratic National Convention.], [An average of polls compiled by the RealClearPolitics website gave her a lead of 1.7 percentage points on Thursday, well down from the solid advantage she had until late last month.]]

我的主要目标是为每个包含多个xpath的列表添加多个包含更多内容的xpath，如下所示：

[[<here_goes_more_content_extracted_from_the_site>|"Our culture has gotten too mean and too rough, especially to children and teenagers," she said. "It is never OK when a 12-year-old girl or boy is mocked, bullied or attacked" in the school yard, she argued, but it is "absolutely unacceptable when it is done by someone with no name hiding on the internet."], [<here_goes_more_content_extracted_from_the_site>|Delivering a get-out-the-vote speech in the Philadelphia suburbs on Thursday, Melania Trump pledged to focus on combating online bullying and campaigning for women and children if her husband is elected to the White House.], [<here_goes_more_content_extracted_from_the_site>|"We have to find a better way to talk to each other, to disagree with each other, to respect each other," she said.], [<here_goes_more_content_extracted_from_the_site>|Thursday's speech was Melania Trump's first since she addressed the Republican National Convention in July. That speech was well-received initially, but was quickly overshadowed by the discovery that sections had been plagiarised from First Lady Michelle Obama's address to the 2008 Democratic National Convention.], [<here_goes_more_content_extracted_from_the_site>|An average of polls compiled by the RealClearPolitics website gave her a lead of 1.7 percentage points on Thursday, well down from the solid advantage she had until late last month.]]

此外，我很好奇是否有任何方法可以传递给find_elements_by_xpath()函数列表或一系列xpath来进行反复：

content = driver.find_elements_by_xpath(['.//*[@id="accordion"]','.//*[@id="accordion2"],...,'.//*[@id="accordion"]']).
content = [x.text for x in content]

那么，如何创建一个符合多个xpath元素的列表，这些元素由：|或其他字符分隔？

Answer 1

那么，如何创建一个符合多个xpath元素的列表，这些元素由：|或其他字符分隔？

你可以做到这一点。 |是XPath中的union运算符。传递由|分隔的多个XPath表达式将按文档顺序在单个列表中返回至少与XPath匹配的元素：

xpath = ".//foo|.//bar|.//baz" 
content = [e.text for e in driver.find_elements_by_xpath(xpath)]
# the result would be something like :
# [<foo ../>,<bar ../>,<baz ../>,<bar ../>]

如果您希望每个XPath的结果都在一个单独的列表中，那么您不需要|：

xpath_list = [".//foo", ".//bar", ".//baz"]
content = [[e.text for e in driver.find_elements_by_xpath(xpath)] \
           for xpath in xpath_list]
# the result would be something like :
# [[<foo ../>],[<bar ../>,<bar ../>],[<baz ../>]]

如何使用selenium的find_elements_by_xpath列出几个元素的内容？

1 个答案: