我如何更改此XML字符串以便XDocument.Parse读取它?

时间:2010-01-15 09:58:52

标签: c# xml xml-serialization linq-to-xml

在以下代码中,我序列化将对象转换为XML 字符串

但是当我尝试使用XDocument.Parse将此XML字符串读入XDocument 时,它会给我错误

  

根级别的数据无效。

XML是:

<?xml version="1.0" encoding="utf-8"?>
<Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <Id>1</Id>
   <FirstName>Jim</FirstName>
   <LastName>Jones</LastName>
   <ZipCode>23434</ZipCode>
</Customer>

更新:以下是十六进制:

![alt text][1] Mod编辑 - 禁用超链接:指向恶意软件的链接

我该如何处理这个XML,以便它在没有错误的情况下读入XDocument?

using System;
using System.Collections.Generic;
using System.Xml.Serialization;
using System.IO;
using System.Xml;
using System.Text;
using System.Xml.Linq;

namespace TestSerialize2342
{
    class Program
    {
        static void Main(string[] args)
        {
            List<Customer> customers = Customer.GetCustomers();

            Console.WriteLine("--- Serializing ------------------");

            foreach (var customer in customers)
            {
                Console.WriteLine("Serializing " + customer.GetFullName() + "...");
                string xml = XmlHelpers.SerializeObject<Customer>(customer);

                XDocument xdoc = XDocument.Parse(xml);

            }

            Console.ReadLine();
        }

    }

    public static class StringHelpers
    {
        public static String UTF8ByteArrayToString(Byte[] characters)
        {
            UTF8Encoding encoding = new UTF8Encoding();
            String constructedString = encoding.GetString(characters);
            return (constructedString);
        }

        public static Byte[] StringToUTF8ByteArray(String pXmlString)
        {
            UTF8Encoding encoding = new UTF8Encoding();
            Byte[] byteArray = encoding.GetBytes(pXmlString);
            return byteArray;
        } 
    }

    public static class XmlHelpers
    {
        public static string SerializeObject<T>(object o)
        {
            MemoryStream ms = new MemoryStream();
            XmlSerializer xs = new XmlSerializer(typeof(T));
            XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.UTF8);
            xs.Serialize(xtw, o);
            ms = (MemoryStream)xtw.BaseStream;
            return StringHelpers.UTF8ByteArrayToString(ms.ToArray());
        }

        public static T DeserializeObject<T>(string xml)
        {
            XmlSerializer xs = new XmlSerializer(typeof(T));
            MemoryStream ms = new MemoryStream(StringHelpers.StringToUTF8ByteArray(xml));
            XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.UTF8);
            return (T)xs.Deserialize(ms);
        }
    }

    public class Customer
    {
        public int Id { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public string Street { get; set; }
        public string Location { get; set; }
        public string ZipCode { get; set; }

        private int internalValue = 23;

        public static List<Customer> GetCustomers()
        {
            List<Customer> customers = new List<Customer>();
            customers.Add(new Customer { Id = 1, FirstName = "Jim", LastName = "Jones", ZipCode = "23434" });
            customers.Add(new Customer { Id = 2, FirstName = "Joe", LastName = "Adams", ZipCode = "12312" });
            customers.Add(new Customer { Id = 3, FirstName = "Jack", LastName = "Johnson", ZipCode = "23111" });
            customers.Add(new Customer { Id = 4, FirstName = "Angie", LastName = "Reckar", ZipCode = "54343" });
            customers.Add(new Customer { Id = 5, FirstName = "Henry", LastName = "Anderson", ZipCode = "16623" });
            return customers;
        }

        public string GetFullName()
        {
            return FirstName + " " + LastName + "(" + internalValue + ")";
        }

    }
}

解答:

感谢Andras,GetPreamble()修复它,所以对于其他任何处理此问题的人来说,这里有一个清理BOM的XML的方法:

public static string RemoveUtf8ByteOrderMark(string xml)
{
    string byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
    if (xml.StartsWith(byteOrderMarkUtf8))
    {
        xml = xml.Remove(0, byteOrderMarkUtf8.Length);
    }
    return xml;
}

3 个答案:

答案 0 :(得分:15)

这是因为数据在流的开头包含unicode或utf8 BOM marks

您需要跳过流中的任何字节顺序标记 - 您可以使用System.Text.Encoding.GetPreamble()方法识别这些标记。

答案 1 :(得分:1)

您可以使用StreamReaderMemoryStream中的数据转换为字符串来解决您的问题:

public static string SerializeObject<T>(object o)
{
    using (MemoryStream ms = new MemoryStream())
    {
        XmlSerializer xs = new XmlSerializer(typeof(T));
        using (XmlWriter xtw = XmlWriter.Create(ms))
        {
            xs.Serialize(xtw, o);
            xtw.Flush();
            ms.Seek(0, SeekOrigin.Begin);
            using (StreamReader reader = new StreamReader(ms))
            {
                return reader.ReadToEnd();
            }
        }
    }
}

答案 2 :(得分:-1)

以上所有内容都是正确的,以下是您应该使用的代码而不是您的代码来跳过BOM:

 public static string SerializeObject<T>(object o)
        {
            MemoryStream ms = new MemoryStream();
            XmlSerializer xs = new XmlSerializer(typeof(T));
            //here is my code
            UTF8Encoding encoding = new UTF8Encoding(false);
            XmlTextWriter xtw = new XmlTextWriter(ms, encoding);
            //XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.UTF8);
            xs.Serialize(xtw, o);
            ms = (MemoryStream)xtw.BaseStream;
            return StringHelpers.UTF8ByteArrayToString(ms.ToArray());
       }

通过在构造函数中指定 false ,您会说“未提供BOM”。请享用! =)