Splinter:获取不是唯一元素的XPATH文本片段

时间:2014-12-13 03:00:58

标签: python html xpath splinter

如何使用Splinter获取question的第一部分,下划线和最后部分的文本并将其存储到变量中?

请参阅底部的HTML。我想让以下变量具有以下值:

first_part = "Jingle bells, jingle bells, jingle all the"
second_part = "_______"
third_part = "! Oh what fun it is to ride in one-horse open sleigh!"

我去了here,使用了XPATHs

//*[@id="question_container"]/div[1]/span/text()[1] #this is first_part
//*[@id="question_container"]/div[1]/span/span      #this is second_part
//*[@id="question_container"]/div[1]/span/text()[2] #this is third_part

并将它们应用于以下HTML。他们在测试中返回了所需的值,但对于我的程序,Splinter似乎拒绝了它们:

first_part = browser.find_by_xpath(xpath = '//*[@id="question_container"]/div[1]/span/text()[1]').text
second_part = browser.find_by_xpath(xpath = '//*[@id="question_container"]/div[1]/span/span').text
third_part = browser.find_by_xpath(xpath = '//*[@id="question_container"]/div[1]/span/text()[2]').text

print first_part
print second_part
print third_part

--------------    OUTPUT     -------------

[]
[]
[]

我做错了什么,为什么这是错的,我应该如何更改我的代码?

使用Splinter的browser.html功能检索引用的HTML(稍微编辑为“Jingle Bells'以更好地传达问题”):

<div id="question_container" style="display: block;">
<div class="question_wrap">

<span class="question">Jingle bells, jingle bells, jingle all the
<span class="underline" style="display: none;">_______</span>
<input type="text" name="vocab_answer" class="answer" id="vocab_answer"></input>
! Oh what fun it is to ride in one-horse open sleigh!</span>

</div></div>

1 个答案:

答案 0 :(得分:1)

传递给xpath方法find_by_xpath()必须指向/结果为元素,而不是文本节点。

一种选择是找到外span,得到它html并将其提供给lxml.html

from lxml.html import fromstring

element = browser.find_by_xpath(xpath='//div[@id="question_container"]//span[@class="question"]')

root = fromstring(element.html)
first_part = root.xpath('./text()[1]')[0]
second_part = root.xpath('./span/text()')[0]
third_part = root.xpath('./text()[last()]')[0]

print first_part, second_part, third_part

打印:

Jingle bells, jingle bells, jingle all the
_______ 
! Oh what fun it is to ride in one-horse open sleigh!