将UTF8转换为ANSI?

时间:2013-01-07 11:48:05

标签: vb.net utf-8

我想使用.Net的WebClient类下载网页,提取标题(即<title></title>之间的内容)并将页面保存到文件中。

问题是,页面以UTF-8编码,System.IO.StreamWriter在使用带有此类字符的文件名时会引发异常。

我用Google搜索并尝试了几种将UTF8转换为ANSI的方法,但无济于事。有人有这方面的工作代码吗?

'Using WebClient asynchronous downloading
Private Sub AlertStringDownloaded(ByVal sender As Object, 
                                  ByVal e As DownloadStringCompletedEventArgs)
    If e.Cancelled = False AndAlso e.Error Is Nothing Then
        Dim Response As String = CStr(e.Result)

        'Doesn't work               
        Dim resbytes() As Byte = Encoding.UTF8.GetBytes(Response)
        Response = Encoding.Default.GetString(Encoding.Convert(Encoding.UTF8, 
                                              Encoding.Default, resbytes))

        Dim title As Regex = New Regex("<title>(.+?) \(", 
                                       RegexOptions.Singleline)
        Dim m As Match
        m = title.Match(Response)
        If m.Success Then
            Dim MyTitle As String = m.Groups(1).Value

            'Illegal characters in path.
            Dim objWriter As New System.IO.StreamWriter("c:\" & MyTitle & ".txt")
            objWriter.Write(Response)
            objWriter.Close()
        End If
    End If
End Sub

编辑:感谢大家的帮助。事实证明,错误不是由于UTF8,而是页面标题部分中隐藏的LF字符,这显然是路径中的非法字符。


编辑:这是删除文件名/路径中的一些非法字符的简单方法:

Dim MyTitle As String = m.Groups(1).Value
Dim InvalidChars As String = New String(Path.GetInvalidFileNameChars()) + New String(Path.GetInvalidPathChars())
For Each c As Char In InvalidChars
    MyTitle = MyTitle.Replace(c.ToString(), "")
Next

编辑:以下是告诉WebClient预期UTF-8的方法:

Dim webClient As New WebClient
AddHandler webClient.DownloadStringCompleted, AddressOf AlertStringDownloaded
webClient.Encoding = Encoding.UTF8
webClient.DownloadStringAsync(New Uri("www.acme.com"))

1 个答案:

答案 0 :(得分:1)

我不认为这个问题与UTF-8有关。我认为如果它出现在同一行,你的正则表达式将包含</title>。 Windows文件名中的字符<>无效。

如果这不是问题,那么查看MyTitle的一些示例输入和输出值会很有帮助。