如何使用VBA从section标签提取URL?

时间:2019-09-18 15:45:16

标签: html vba web-scraping

我正在寻找从HTML中提取特定类型的URL的方法,如下所示: 这里的唯一标识符是数据规范代码下的值,例如PROROC和KROROC。

<section data-spec-code="PROROC" only-child="">
    <div class="test-class">
        <div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-0">
            <div class="test-class-title">
                <h5 class="top-offset-10 bottom-offset-0 force-bold-font"><span>Data </span></h5>

            </div>
        </div>
        <div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-1">
            <div class="test-class-title">
                <!----><em>&nbsp;<span class="hidden">Data </span></em>
          </div>
        </div>
        <div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-2">
            <div class="test-class-title">
                <!----><em>&nbsp;<span class="hidden">Data </span></em>
                <!----><small class="help-me-choose-link helpmechoosestyle"><a href="//www.url-i-want-to-extract.com" target="_blank">URL 1</a></small></div>
        </div>
    </div>
</section>
<section data-spec-code="KROROC" only-child="">
    <div class="test-class">
        <div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-0">
            <div class="test-class-title">
                <h5 class="top-offset-10 bottom-offset-0 force-bold-font"><span>Data 2</span></h5>
            </div>
        </div>
        <div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-1">
            <div class="test-class-title">
                <!----><em>&nbsp;<span class="hidden">Data 2</span></em>
            </div>
        </div>
        <div only-child="" class=" col-sm-4 col-md-3 col-lg-3 show-top-border hidden-xs tech-spec-title-container stack-2">
            <div class="test-class-title">
                <!----><em>&nbsp;<span class="hidden">Data 2</span></em>
                <!----><small class="help-me-choose-link helpmechoosestyle"><a href="//www.2nd-url-i-want-to-extract.com" target="_blank">URL 2</a></small></div>
        </div>
    </div>
</section>

我已经基于stackoverflow和google的研究完成了代码,但是我只能从页面或使用getElementsBy提取所有链接。

我无法使用这些选项,因为超链接嵌套在另一个标签中,并且页面上的超链接太多。我也尝试使用querySelector但失败了。

我希望我能从大家那里获得一些有关如何实现这一目标的建议/指导。

以下是我的预期结果:

PROROC www.url-i-want-to-extract.com

KROROC www.2nd-url-i-want-to-extract.com

1 个答案:

答案 0 :(得分:0)

除了对代码的描述之外,还有助于查看实际代码。

您可以从属性选择器开始,以具有那些attribute = value对的元素为目标并抓取子标签

(abc)