将网页HTML下载为UTF-8字符串

时间:2016-08-24 22:09:03

标签: html vb.net utf-8

我想下载在线页面的内部html,但是当我这样做时,像šđčćž这样的字符会被ć¡取代等等。

我正在使用的代码:

Dim sourceString As String = New System.Net.WebClient().DownloadString("SomeWebPage")
TextBox1.Text = sourceString

2 个答案:

答案 0 :(得分:2)

您可能必须下载字节,然后使用Encoding类转换为UTF8:

Async Function GetHtmlString(address As String) As Task(Of String)
    Using client As New WebClient
        Dim bytes  = Await client.DownloadDataTaskAsync(address)
        Dim s  = Encoding.UTF8.GetString(bytes)
        return s
    End Using
End Function

感谢@ dave的评论更简单:

Async Function GetHtmlString(address As String) As Task(Of String)
    Using client As New WebClient
        client.Encoding = Encoding.UTF8
        Dim s  = Await client.DownloadStringTaskAsync(address)
        return s
    End Using
End Function

用法示例:

Imports System.Net
Imports System.Text

Public Class Form1
    Private Async Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim s = Await GetHtmlString("http://www.radiomerkury.pl/")
    End Sub

    Async Function GetHtmlString(address As String) As Task(Of String)
        Using client As New WebClient
            client.Encoding = Encoding.UTF8
            Dim s = Await client.DownloadStringTaskAsync(address)
            Return s
        End Using
    End Function
End Class

答案 1 :(得分:0)

Kibi,我认为你的方式远远不够。我不知道VB.NET将如何帮助你解决这类问题。下面是一个简单,直观的Excel& VBA解决方案。我希望这有助于您实现目标。

Sub DumpData()

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

URL = "http://finance.yahoo.com/q?s=sbux&ql=1"

'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
   DoEvents
Loop

RowCount = 1

With Sheets("Sheet1")
   .Cells.ClearContents
   RowCount = 1
   For Each itm In IE.document.all
      .Range("A" & RowCount) = itm.tagname
      .Range("B" & RowCount) = itm.ID
      .Range("C" & RowCount) = itm.classname
      .Range("D" & RowCount) = Left(itm.innertext, 1024)

      RowCount = RowCount + 1
   Next itm
End With
End Sub