从html中提取特定元素/值

时间:2013-07-24 20:57:56

标签: vba

我试图在下面做a)或b)。我更愿意a),如果我能搞清楚的话。 请参阅最后的html。

a)提取以下项目的值 - “”中的项目是静态的,但相关的值将会改变。我只想提取价值。

"locality" = Paris
"region" = Paris
"country-name" = France
"latitude" = 48.85534
"longitude" = 2.35048

b)简单地提取整个元素<div class="vcard">...<div>

我正在尝试重用其他人的代码,让它做我想做的事。但是我无法绕过代码。我设法提取一些值。但它很混乱。我认为代码可以做得更好:

THE VBA

Sheet1.WebBrowser1.Navigate (Sheet1.Range("C1"))

Do
DoEvents
Loop Until Sheet1.WebBrowser1.ReadyState = READYSTATE_COMPLETE

the_html_code = Sheet1.WebBrowser1.Document.Body.InnerHTML

    the_output_row = 2
    start_of_item = InStr(the_html_code, "locality")
    the_value = Mid(the_html_code, start_of_item + 39, Len(the_html_code))
    the_html_code = Mid(the_html_code, start_of_item + 8, Len(the_html_code))
    the_value = Mid(the_value, 1, InStr(the_value, Chr(62)) - 1)
        Sheet1.Range("L" & the_output_row) = the_value

HTML

    <script>
        if (typeof (aadSponsoredLinksObj) != 'undefined' && aadSponsoredLinksObj.type == 'google' && aadSponsoredLinksObj.show_links == true) {
            document.write('<scr' + 'ipt src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></scr' + 'ipt>');
        } else if (typeof (aadSponsoredLinksObj) == 'undefined') {
            jQuery('#ad-links').remove();
        }
    </script>
<div id="tracking-pixels"></div>

</div>
<!-- /#wrap -->

    <div class="vcard">
        <span class="adr">
            <span class="locality">
                <span class="value-title" title="Paris" ></span>
            </span>
            <abbr class="region" title="Paris">
                <span class="value-title" title="75" ></span>
            </abbr>
            <abbr class="country-name" title="France">
                <span class="value-title" title="FR" ></span>
            </abbr>
        </span>
        <span class="geo">
            <span class="latitude">
                <span class="value-title" title="48.85534" ></span>
            </span>
            <span class="longitude">
                <span class="value-title" title="2.35048"></span>
            </span>
        </span>
    </div>

    <script type="text/javascript">
        var _qoptions = { qacct: 'p-4b4gl_1fWISuU' };
        if (typeof (apgPageInfoObj) != 'undefined' && apgPageInfoObj.crumb_trail) {
            _qoptions.labels = apgPageInfoObj.crumb_trail.join('.');

1 个答案:

答案 0 :(得分:0)

正如David Zemens建议的那样,您可以在MSXML中使用DOM解析器。您可以在VBA引用对话框中添加对Microsoft XML的引用(可能与最新的v6.0一起使用)。有一个在线参考这个库here