如何使用selenium和Python抓取嵌套数据

时间:2017-04-13 05:42:24

标签: python-2.7 selenium xpath

我基本上想要在@classmethod下的诉讼律师助理var timeValue = "06:35PM"; console.log( timeValue.split(/(?=[A-Z]{2})/).join(" ") );下的 Olswang ,但我无法前往它。这是代码中的HTML:

<h3 class="Sans-17px-black-85%-semibold">

以下是我目前在代码中使用selenium所做的事情:

<span class="pv-entity__secondary-title Sans-15px-black-55%">

我的输出: 体验标题:<div class="pv-entity__summary-info"> <h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3> <h4> <span class="visually-hidden">Company Name</span> <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span> </h4> <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%"> <span class="visually-hidden">Dates Employed</span> <span>Feb 2016 – Present</span> </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0"> <span class="visually-hidden">Employment Duration</span> <span class="pv-entity__bullet-item">1 yr 2 mos</span> </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block"> <span class="visually-hidden">Location</span> <span class="pv-entity__bullet-item">London, United Kingdom</span> </h4></div> </div> if tree.xpath('//*[@class="pv-entity__summary-info"]'): experience_title = tree.xpath('//*[@class="Sans-17px-black-85%-semibold"]/h3/text()') print(experience_title) experience_company = tree.xpath('//*[@class="pv-position-entity__secondary-title pv-entity__secondary-title Sans-15px-black-55%"]text()') print(experience_company)

2 个答案:

答案 0 :(得分:0)

您的XPath表达式不正确:

  • //*[@class="Sans-17px-black-85%-semibold"]/h3/text()表示 h3的文字内容,其中child为具有类名属性"Sans-17px-black-85%-semibold" 的元素。相反,你需要

    //h3[@class="Sans-17px-black-85%-semibold"]/text()

    表示具有类名属性h3的<{1}}元素的文本内容

  • "Sans-17px-black-85%-semibold"您忘记了//*[@class="pv-position-entity__secondary-title pv-entity__secondary-title Sans-15px-black-55%"]text()之前的斜线(您需要text(),而不仅仅是/text())。此外,目标text()没有类名span。你需要使用

    pv-position-entity__secondary-title

答案 1 :(得分:0)

您可以使用CSS选择器轻松获得这两者,我发现它们比XPath更容易阅读和理解。

driver.find_element_by_css_selector("div.pv-entity__summary-info > h3").text
driver.find_element_by_css_selector("div.pv-entity__summary-info span.pv-entity__secondary-title").text

.表示班级名称 >表示孩子(仅低于一级) 表示后代(以下任何级别)

以下是一些可以帮助您入门的参考资料。

CSS Selectors Reference

CSS Selectors Tips

Advanced CSS Selectors