解析VBScript中的XML错误

时间:2012-10-05 15:06:43

标签: xml parsing vbscript xml-parsing

我有这个简单的VBScript发送HTTP POST请求并读取返回的HTML响应。

Function httpPOST(url, body, username, password )  
  Set Http = CreateObject("Msxml2.ServerXMLHTTP")   
  Http.Open "POST", url, False, username, password  
  Http.setRequestHeader _  
              "Content-Type", _  
              "application/x-www-form-urlencoded"  
  Http.send body 
  pagestatus = Http.status
  if pagestatus<> "200" then
    httpPOST="Error:"& pagestatus
  else
    'httpPOST = Http.ResponseBody
    'httpPOST = Http.responseText
    Set objXMLDoc = CreateObject("MSXML.DOMDocument")
    objXMLDoc.async = False
    objXMLDoc.validateOnParse = False
    objXMLDoc.load(Http.ResponseBody)
    Set objNode = objXMLDoc.selectSingleNode("/html/body/center/img")
    httpPost = objNode.getAttribute("alt") 
  end if
End Function

HTML响应格式如下:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
        <title>---</title>
    </head>
    <body>
        <center>
            <img alt="You are now connected" src="pages/GEN/connected_gen.png">
        </center>
    </body>
</html>

此脚本的问题是它始终返回Error: Object required: 'objNode'

我尝试了XML解析器的很多变体,最后每次遇到与XML对象相关的错误时都会放弃。

1 个答案:

答案 0 :(得分:2)

解决了您的第一个问题here.load期望'包含指定XML文件位置的URL的字符串';所以使用.loadXml检查是否Http.ResponseBody 包含MSXML?.DOMDocument可以解析的数据(您的第二个问题)。

<强>更新

“有效”的东西(及其原因):

  Dim sHTML : sHTML = readAllFromFile("..\data\02.html")
  WScript.Echo sHTML
  Dim oXDoc : Set oXDoc = CreateObject("MSXML2.DOMDocument")
  oXDoc.async = False
  oXDoc.validateOnParse = False
  oXDoc.setProperty "SelectionLanguage", "XPath"
  If oXDoc.loadXML(sHTML) Then
     Dim ndImg : Set ndImg = oXDoc.selectSingleNode("/html/body/center/img")
     Dim httpPost : httpPost = ndImg.getAttribute("alt")
     WScript.Echo "ok", httpPost
  Else
     WScript.Echo "Error: " & trimWS(oXDoc.parseError.reason)
  End If

输出:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
        <title>---</title>
    </head>
    <body>
        <center>
            <img alt="You are now connected" src="pages/GEN/connected_gen.png"/>
        </center>
    </body>
</html>

ok You are now connected

MSXML2.DOMDocument .loadXML(并解析)HTML代码,前提是它是“XML-valid”。您的HTML不是,因为img标记未关闭 - 我为您的原始代码获取的错误消息:

Error: End tag 'center' does not match the start tag 'img'.

如何进一步继续取决于您是否能够/愿意更改HTML。

更新II:

虽然你可以做一些讨厌的事情.ResponseBody 之前你把它提供给.loadXML - 为什么不使用HTML工具来解析HTML:

  Dim sHTML : sHTML = readAllFromFile("..\data\01.html")
  WScript.Echo sHTML
  Dim oHF : Set oHF = CreateObject("HTMLFILE")
  oHF.write sHTML
  Dim httpPost : httpPost = oHF.documentElement.childNodes(1).childNodes(0).childNodes(0).alt
  WScript.Echo "ok", httpPost

输出:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
        <title>---</title>
    </head>
    <body>
        <center>
            <img alt="You are now connected" src="pages/GEN/connected_gen.png">
        </center>
    </body>
</html>

ok You are now connected

如输出所示,HTMLFILE接受你的'not-xml-closed'img;当然,获得你真正想要的东西的方法应该被消毒。