Question

我在这个XML字符串上有一个XML阅读器：

<?xml version="1.0" encoding="UTF-8" ?>
<story id="1224488641nL21535800" date="20 Oct 2008" time="07:44">
<title>PRESS DIGEST - PORTUGAL - Oct 20</title>
<text>
<p>    LISBON, Oct 20 (Reuters) - Following are some of the main
 stories in Portuguese newspapers on Monday. Reuters has not
verified these stories and does not vouch for their accuracy. </p>
<p>More HTML stuff here</p>
</text>
</story>

我创建了一个XSD和一个用于反序列化的相应类。

[System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
public class story {
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string id;
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string date;
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string time;
    public string title;
    public string text;
}

然后我使用XmlSerializer的Deserialize方法创建该类的实例。

XmlSerializer ser = new XmlSerializer(typeof(story));
return (story)ser.Deserialize(xr);

现在，text的{{1}}成员始终为空。如何更改我的story类，以便按预期解析XML？

编辑：

使用XmlText不起作用，我无法控制我正在解析的XML。

Answer 1

我发现非常不满意的解决方案。

改变这样的课程（呃！）

// ...
[XmlElement("HACK - this should never match anything")]
public string text;
// ...

并改变这样的调用代码（哎呀！）

XmlSerializer ser = new XmlSerializer(typeof(story));
string text = string.Empty;
ser.UnknownElement += delegate(object sender, XmlElementEventArgs e) {
    if (e.Element.Name != "text")
        throw new XmlException(
              string.Format(CultureInfo.InvariantCulture, 
             "Unknown element '{0}' cannot be deserialized.",
             e.Element.Name));
    text += e.Element.InnerXml;
};

story result = (story)ser.Deserialize(xr);
result.text = text;
return result;

这是一种非常糟糕的方式，因为它破坏了封装。有没有更好的方法呢？

Answer 2

如果文本标签只包含p标签，我打算做的是以下内容，它可能在短期内有用。

不是将文本字段作为字符串的故事，而是将其作为字符串数组。然后，您可以使用正确的XmlArray属性（不记得确切的名称，如XmlArrayItemAttribute），使用正确的参数使其看起来像：

<text>
   <p>blah</p>
   <p>blib</p>
</text>

更近一步，但不完全是你需要的。

另一个选择是创建一个类：

public class Text //Obviously a bad name for a class...
{
   public string[] p;
   public string[] pre;
}

再次使用XmlArray属性使其看起来正确，不确定它们是否可以配置，因为我之前只将它们用于简单类型。

编辑：

使用：

[System.Xml.Serialization.XmlRootAttribute(Namespace = "", IsNullable = false)]
    public class story
    {
        [System.Xml.Serialization.XmlAttributeAttribute()]
        public string id;
        [System.Xml.Serialization.XmlAttributeAttribute()]
        public string date;
        [System.Xml.Serialization.XmlAttributeAttribute()]
        public string time;
        public string title;

        [XmlArrayItem("p")]
        public string[] text;

    }

与提供的XML一起使用，但让类看起来有点复杂。它最终类似于：

    <text>
       <p>
          <p>qwertyuiop</p>
          <p>asdfghjkl</p>
       </p>
       <pre>
          <pre>stuff</pre>
          <pre>nonsense</pre>
       </pre>
   </text>

这显然不是所期望的。

Answer 3

您可以为您的类实现IXmlSerializable并处理其中的内部元素，这意味着您保留用于在目标类中反序列化数据的代码（从而避免了封装问题）。这是一个足够简单的数据类型，代码应该很容易编写。

Answer 4

在我看来XML不正确。由于您在文本标记中使用HTML标记，因此HTML标记将被解释为XML。您应该使用CDATA正确解释数据或转义＆lt;和＆gt;。

Answer 5

由于您无法控制XML，因此您可以使用StreamReader。 XmlReader将HTML标记解释为XML，这不是您想要的。

然而，

XmlSerializer将剥离文本标记中的HTML标记。

Answer 6

使用XmlAnyElement属性而不是处理UnknownElement事件可能更优雅。

Answer 7

你试过xsd.exe吗？它允许您从xml doc创建xsd，然后从xsd生成应该适合xml反序列化的类。

Answer 8

请同时查看similar question I asked ...这可能有助于回答您的问题

Answer 9

在使用XSD.exe从XML生成XSD然后将XSD生成到类之后，我遇到了同样的问题。我在生成的类文件中的对象类之前添加了一个[XmlText]标记（在我的情况下称为P，因为它推断为<p>标记为XML节点）并且它立即起作用。拉入父节点内的完整HTML内容并放入该P对象，然后将其重命名为更有用的内容。

如何使用XmlSerializer获取XML元素的内容？

9 个答案: