我正在寻找从HTML中提取特定类型的URL的方法,如下所示: 这里的唯一标识符是数据规范代码下的值,例如PROROC和KROROC。
<section data-spec-code="PROROC" only-child="">
<div class="test-class">
<div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-0">
<div class="test-class-title">
<h5 class="top-offset-10 bottom-offset-0 force-bold-font"><span>Data </span></h5>
</div>
</div>
<div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-1">
<div class="test-class-title">
<!----><em> <span class="hidden">Data </span></em>
</div>
</div>
<div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-2">
<div class="test-class-title">
<!----><em> <span class="hidden">Data </span></em>
<!----><small class="help-me-choose-link helpmechoosestyle"><a href="//www.url-i-want-to-extract.com" target="_blank">URL 1</a></small></div>
</div>
</div>
</section>
<section data-spec-code="KROROC" only-child="">
<div class="test-class">
<div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-0">
<div class="test-class-title">
<h5 class="top-offset-10 bottom-offset-0 force-bold-font"><span>Data 2</span></h5>
</div>
</div>
<div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-1">
<div class="test-class-title">
<!----><em> <span class="hidden">Data 2</span></em>
</div>
</div>
<div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-2">
<div class="test-class-title">
<!----><em> <span class="hidden">Data 2</span></em>
<!----><small class="help-me-choose-link helpmechoosestyle"><a href="//www.2nd-url-i-want-to-extract.com" target="_blank">URL 2</a></small></div>
</div>
</div>
</section>
我已经基于stackoverflow和google的研究完成了代码,但是我只能从页面或使用getElementsBy提取所有链接。
我无法使用这些选项,因为超链接嵌套在另一个标签中,并且页面上的超链接太多。我也尝试使用querySelector但失败了。
我希望我能从大家那里获得一些有关如何实现这一目标的建议/指导。
以下是我的预期结果:
PROROC www.url-i-want-to-extract.com
KROROC www.2nd-url-i-want-to-extract.com
答案 0 :(得分:0)
除了对代码的描述之外,还有助于查看实际代码。
您可以从属性选择器开始,以具有那些attribute = value对的元素为目标并抓取子标签
(abc)