用于从此HTML中提取地址的XPath表达式

时间:2018-01-04 21:01:49

标签: html xpath

我需要在这个可怕的HTML中的电话号码前分别提取以下3个地址,但我绝对难倒

npm start

我以为我会查询

<div class='additional-locations collapsible'>
    <div class='row'>
        <div class='location'>
             CompanyName<br /> 123 Some Street<br />City Province PostalCode<br />Country<br /><strong>Phone:</strong>123 456 7890<br /><strong>Fax:</strong> 123 456 7890
            <br />
            <strong>County:</strong> County<br />
            <strong>Electoral District:</strong> 01<br />

            <hr />

            CompanyName<br /> 546 SomeOther Street<br />City Province PostalCode<br />Country<br /><strong>Phone:</strong>123 456 7890<br /><strong>Fax:</strong> 123 456 7890
            <br />
            <strong>County:</strong> County<br />
            <strong>Electoral District:</strong> 02<br />

            <hr />

            CompanyName<br /> 378 Another Street<br />City Province PostalCode<br />Country<br /><strong>Phone:</strong>123 456 7890<br /><strong>Fax:</strong> 123 456 7890
            <br />
            <strong>County:</strong> County<br />
            <strong>Electoral District:</strong> 03<br />
        </div>
    </div>
</div>

并尝试抓住前面的文字,但我似乎无法弄清楚,有人可以帮忙吗?

1 个答案:

答案 0 :(得分:1)

正如您已添加<p class="testimonial" id="1">This is some text</p> <div class="testimonial" id="2">Lorem ipsum</div>标记,请尝试使用XPath表达式,该表达式应适用于XPath 2.0以获取所需数据:

xpath-2.0

输出:

for $i in //div[@class='location']/text()[normalize-space()="CompanyName"] 
    return $i/string-join(following-sibling::text()[position()<4], ", ")