我正在通过@OmegaStripes查看这个问题的答案How to get a particular InnerText from a specific class?这里使用Split
函数和指定的分隔符字符串从{{href
中提取.responseBody
1}}。
然后我尝试复制此内容以提取以下href
:
"https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv"
来自NHS England's Ambulance Quality Indicators
HTML片段:
<main class="main group" role="main">
<div class="page-content" id="main-content">
<header>
<h1>Ambulance Quality Indicators</h1>
</header>
<article class="rich-text">
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p><strong>CSV Data</strong><br>
These files have the same data as other published spreadsheets, but without any formatting:<br>
<a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv" class="csv-link" onclick="ga('send', 'event', 'Downloads', 'CSV', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv');">New Systems Indicators August 2017 to January 2018 (CSV, 23KB)</a><br>
</article>
</div>
</main>
&#13;
问题:
我收到的回复文本如下:
示例回复文字:
从一点点研究中,看到参考文献,我猜测这可能是一个编码问题?
我尝试设置.SetRequestHeader
.setRequestHeader "Content-Type", _
"application/x-www-form-urlencoded; charset=UTF-8"
这对输出没有影响。
说实话,我还没有办法解决这个问题。
有关我如何获得预期的回复文本的任何建议吗?即我可以解析感兴趣的href
。
上下文
这是更大工作的一部分:
1)我想抓取CSV链接(其名称将在每个月更改),没有浏览器弹出窗口
2)下载目标文件内容
3)使用ADODB.Stream写出二进制文件。
@OmegaStripes在回答我的问题Return focus to ThisWorkbook.Activesheet after XMLHTTP60 file download时概述了这个过程。我正在努力理解并实施该建议。
代码:
Option Explicit
Public Const url As String = "https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/"
Public aBody As String
Sub Testing()
' Download via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=utf-8"
.send
' Get binary response content
aBody = .responseBody
End With
ActiveSheet.Range("A1") = aBody
End Sub
参考文献:
1)XMLHTTP and Special Characters (eg, accents)
2)setRequestHeader Method (IXMLHTTPRequest)
答案 0 :(得分:0)
因此,@FlorentB可以获得此解决方案,并向@OmegaStripes大声提供建议。
正如所建议的那样,问题确实是.responseBody
返回了一个编码为UTF-8的字节数组。正如所指出的那样,我将它转换为String(UTF-16编码)因此所有这些外来字符。
我使用@ Tomalak的函数BytesToString
进行微小更改,以处理转换为字符串。
代码:
Option Explicit
Public Const url As String = "https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/"
Public aBody As String 'this is causing the conversion
Const adTypeBinary As Byte = 1
Const adTypeText As Byte = 2
Const adModeReadWrite As Byte = 3
Public Const strPath As String = "C:\Users\User\Desktop\testXMLHTTPOutput"
Public Sub Testing()
' Download via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
' Get binary response content
aBody = BytesToString(.responseBody, "UTF-8")
End With
Dim fso As Object 'late binding
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
Set oFile = fso.CreateTextFile(strPath)
oFile.WriteLine aBody
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub
'ADODB.Stream with stream.CharSet = "UTF-8"
'http://msdn.microsoft.com/en-us/library/windows/desktop/ms675032%28v=vs.85%29.aspx
Public Function BytesToString(ByVal bytes As Variant, ByVal charset As String) As String
With CreateObject("ADODB.Stream")
.Mode = adModeReadWrite
.Type = adTypeBinary
.Open
.Write bytes
.Position = 0
.Type = adTypeText
.charset = charset
BytesToString = .ReadText
End With
End Function
这里有用的其他链接: