从Web上使用编码ISO-8859-1解析XML

时间:2015-03-11 09:10:51

标签: c# xml web-services encoding

我需要使用编码ISO-8859-1从Web读取XML文件。用它创建一个XmlDocument后,我试图将它的一些InnerText转换为UTF。但那并没有奏效。然后我试图改变HttpClient上的编码。响应字符串格式正确,但在创建XmlDocument时,应用程序崩溃时出现异常:HRESULT:0xC00CE55F或XML字符串上的非预期字符。我该如何解决这个问题?

代码段:

private static async Task<string> GetResultsAsync(string uri)
        {
            var client = new HttpClient();
            var response = await client.GetByteArrayAsync(uri);
            var responseString = Encoding.GetEncoding("iso-8859-1").GetString(response, 0, response.Length - 1);
            return responseString;
        }

public static async Task GetPodcasts(string url)
        {
            var progrmas = await GetGroupAsync("prog");
            HttpClient client = new HttpClient();

            //Task<string> pedido = client.GetStringAsync(url);
            //string res = await pedido; //Gets the string with the wrong chars, LoadXml doesn't fails

            res = await GetResultsAsync(url); //Gets the string properly formatted
            XmlDocument doc = new XmlDocument();

            doc.LoadXml(res);  //Crashes here
            XmlElement root = doc.DocumentElement;

            XmlNodeList nodes = root.SelectNodes("//item");

            //Title
            var node_titles = root.SelectNodes("//item/title");
            IEnumerable<string> query_titles = from nodess in node_titles select nodess.InnerText;
            List<string> list_titles = query_titles.ToList();
            //........

            for (int i = 0; i < list_titles.Count; i++)
            {
                PodcastItem podcast = new PodcastItem();
                string title = list_titles[i];


                //First attempt to convert a field from the XmlDocument, with the wrong chars. Only replaces the bad encoding with a '?':

                //Encoding iso = Encoding.GetEncoding("ISO-8859-1");
                //Encoding utf8 = Encoding.UTF8;
                //byte[] utfBytes = utf8.GetBytes(title);
                //byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
                //string msg = iso.GetString(isoBytes, 0, isoBytes.Length - 1);

                PodcastItem dataItem = new PodcastItem(title + pubdate, title, link, description, "", pubdate);
                progrmas.Items.Add(dataItem);
            }

        }

1 个答案:

答案 0 :(得分:1)

我不确定你为什么试图摆弄自己的编码,但是你崩溃的原因很可能是因为你忘了取数组的最后一个字节。这段代码适合我:

    static async Task<string> LoadDecoced()
    {
        var client = new HttpClient();
        var response = await client.GetByteArrayAsync("http://www.rtp.pt/play/podcast/469");
        var responseString = Encoding
           .GetEncoding("iso-8859-1")
           .GetString(response, 0, response.Length); // no -1 here, we want all bytes!
        return responseString;
    }

如果我让HttpClient弄明白你的代码对我有用:

    static async Task<string> Load()
    {
        var hc = new HttpClient();
        string s = await hc.GetStringAsync("http://www.rtp.pt/play/podcast/469");
        return s;
    }

    static void Main(string[] args)
    {

        var xd = new XmlDocument();
        string res = Load().Result;
        xd.LoadXml(res);
        var node_titles = xd.DocumentElement.SelectNodes("//item/title");

        Console.WriteLine(node_titles.Count);
    }

如果您使用的是非移动/非WinRT,则XmlDocument.Load接受的流也会相同:

    static async Task<Stream> LoadStream()
    {
        var hc = new HttpClient();
        var stream = await hc.GetStreamAsync("http://www.rtp.pt/play/podcast/469");
        return stream;
    }

    static void Main(string[] args)
    {

        var xd2 = new XmlDocument();
        xd2.Load(LoadStream().Result);

        var node_titles2 = xd2.DocumentElement.SelectNodes("//item/title");

        Console.WriteLine(node_titles2.Count);
    }

这是我的控制台中的结果: Console output of encoded xml

你确定你没有在其他地方编码吗?

作为一般建议:框架类能够处理大多数常见的编码方案。尽量让它工作而不必乱用编码类。