在JavaScript

时间:2017-02-22 09:34:25

标签: javascript xpath web-scraping extract screen-scraping

我能够提取文档中存在的任何元素的xpath但不能提取其内容/值/数据。例如:

<div class="container" title="DivA">
DivA
<div id="container" title="#DivB">
    #DivB
    <div title="DivC (div)">
        DivC (div)
    </div>
    <span title="SpanD (span)">
        SpanD (span)
        <ul>
            <li title="Bullet 1">Bullet 1</li>
            <li id="bullet2" title="Bullet 2">Bullet 2 (#bullet2)</li>
            <li title="Bullet 3">Bullet 3</li>
        </ul>
    </span>
          <img src="favicon.ico">
         <a href="http://google.com/">Dummy Href</a>
</div>

  

我需要标签之间的内容   例如:
  var path = // * [@ id =&#39; container&#39;] / span 1 / ul 1 / li [3]; //子弹3的路径   var data = path.value //或任何有助于提取数据的东西    var数据将是Bullet 3

xpath提取截图: Xpath for image without @src

1 个答案:

答案 0 :(得分:3)

您可以使用document.evaluate来完成此任务..

像这样:

&#13;
&#13;
var li = document.evaluate( "//*[@id='container']/span[1]/ul[1]/li[1]", document, null, XPathResult.STRING_TYPE, null);
var a = document.evaluate("//*[@id='container']/span[1]/a/@href",document, null, XPathResult.STRING_TYPE, null);
console.log(li.stringValue);
console.log(a.stringValue);
&#13;
<div class="container" title="DivA">
DivA
<div id="container" title="#DivB">
    #DivB
    <div title="DivC (div)">
        DivC (div)
    </div>
    <span title="SpanD (span)">
        SpanD (span)
        <a href="https://google.com">Google</a>
        <ul>
            <li title="Bullet 1">Bullet 1</li>
            <li id="bullet2" title="Bullet 2">Bullet 2 (#bullet2)</li>
            <li title="Bullet 3">Bullet 3</li>
        </ul>
    </span>
</div>
&#13;
&#13;
&#13;