无法使用vba

时间:2018-05-15 14:09:17

标签: vba excel-vba text web-scraping excel

我在vba中编写了一个脚本,用于从我的桌面加载包含text的{​​{1}}文件,并在html elements名称title内打印class谎言从中。当我执行我的脚本时,它会抛出错误question-hyperlink。我哪里出错了,我可以做些什么来打印呢?提前谢谢。

文本文件中的内容:

object variable or with block---

到目前为止我尝试过:

<div class="summary">
        <h3><a href="/questions/50348809/javascript-if-class-x-contains-z-get-link-of-class-y" class="question-hyperlink">javascript if class x contains z get link of class y</a></h3>
        <div class="excerpt">
            i'm no js expert but need to execute some js in my applescript. Don't know if this is possible as the html page contains several instances of this div class.
If nested div class ".product_card__title"...
        </div>          
        <div class="tags t-javascript t-web-scraping t-applescript">
            <a href="/questions/tagged/javascript" class="post-tag" title="show questions tagged 'javascript'" rel="tag">javascript</a> <a href="/questions/tagged/web-scraping" class="post-tag" title="show questions tagged 'web-scraping'" rel="tag">web-scraping</a> <a href="/questions/tagged/applescript" class="post-tag" title="show questions tagged 'applescript'" rel="tag">applescript</a> 
        </div>
        <div class="started fr">
            <div class="user-info ">
    <div class="user-action-time">
        asked <span title="2018-05-15 11:15:30Z" class="relativetime">2 hours ago</span>
    </div>
    <div class="user-gravatar32">
        <a href="/users/6809723/gto"><div class="gravatar-wrapper-32"><img src="https://www.gravatar.com/avatar/5d4e619fab77f9d58ee457a321e48d37?s=32&amp;d=identicon&amp;r=PG" alt="" width="32" height="32"></div></a>
    </div>
    <div class="user-details">
        <a href="/users/6809723/gto">GTO</a>
        <div class="-flair">
            <span class="reputation-score" title="reputation score " dir="ltr">37</span><span title="1 silver badge"><span class="badge2"></span><span class="badgecount">1</span></span><span title="7 bronze badges"><span class="badge3"></span><span class="badgecount">7</span></span>
        </div>
    </div>
</div>
        </div>  
    </div>

也是这样尝试但仍然一样:

Sub GetFileFromText()
    Dim HTML As New HTMLDocument, post As Object, strCont$

    Open "C:\Users\WCS\Desktop\content.txt" For Binary As #1
    strCont = Space$(LOF(1))
    Get #1, , strCont
    Close #1
    HTML.body.innerHTML = strCont

    Set post = HTML.getElementsByClassName("question-hyperlink")(0)
    MsgBox post.innerText
End Sub

我期待的输出:

Sub GetFileFromText()
    Dim strContent$, HTML As New HTMLDocument, post As Object

    With CreateObject("ADODB.Stream")
        .Charset = "utf-8"
        .Open
        .LoadFromFile ("C:\Users\WCS\Desktop\content.txt")
        strContent = .ReadText()
        HTML.body.innerHTML = strContent
    End With

    Set post = HTML.getElementsByClassName("question-hyperlink")(0)
    MsgBox post.innerText
End Sub

1 个答案:

答案 0 :(得分:2)

我对该文件采用UTF-8编码的评论是错误的。奇怪的第一个(在你的情况下是2个)字符定义文件的编码,ÿþ&#39; UTF-16(小端)&#39; 。这些字符称为&#39; BOM&#39; 或字节顺序标记。详细列表可在https://en.wikipedia.org/wiki/Byte_order_mark

找到

好消息是ADODB.Stream了解您的BOM。您的命令.Charset = "utf-8"只是混淆了它 - 它试图将数据读取为UTF-8(当然是错误的)。只是摆脱那条线应该做的伎俩。

为避免运行时错误,您可以检查post - 变量的分配是否成功 - 例如,它可能会失败,因为该文件根本不包含该类:

If post Is Nothing Then
    MsgBox "class not found"
Else
    MsgBox post.innerText
End If