Linq to XML:解析非结构化元素

时间:2016-07-13 15:34:48

标签: c# xml linq

我正在根据以下XML结构为移动应用创建HTML响应页面:

<?xml version="1.0" encoding="UTF-8"?>
<Volumes>
<paragraph>
</paragraph>
<paragraph>
 <text>Apple</text>
   <page>1</page>
 <remark>(Apple Inc.) </remark>
</paragraph>
<paragraph>
 <explanation>Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services</explanation> 
</paragraph>
<paragraph>
 <text>Dell</text>
 <remark>Dell Inc.</remark>
</paragraph>
<paragraph>
 <explanation>Dell Inc. is an American privately owned multinational computer technology company based in Round Rock, Texas, United States, that develops, sells, repairs, and supports computers and related products and services.</explanation> 
</paragraph>
<paragraph>
 <text>Michael Dell</text>
   <search>dell</search>
 <remark>born February 23, 1965</remark>
</paragraph>
<paragraph>
 <explanation> Michael Saul Dell (born February 23, 1965) is an American business magnate, investor, philanthropist, and author</explanation> 
</paragraph>
<paragraph>
 <explanation> Business Career : </explanation> 
</paragraph>
<paragraph>
 <explanation> While a freshman pre-med student at the University of Texas, Dell started an informal business putting together and selling upgrade kits for personal computers[11] in Room 2713 of the Dobie Center residential building. He then applied for a vendor license to bid on contracts for the State of Texas, winning bids by not having the overhead of a computer store</explanation> 
</paragraph>
<paragraph>
 <text>HP</text>
 <remark>Hewlett-Packard</remark>
</paragraph>
<paragraph>
 <explanation>Something here</explanation> 
</paragraph>
</Volumes>

基本上每个元素都会创建一个新的段落行。段落中的每个元素都将定义文本格式。

段落/文字定义段落的标题。它后面应该是另一个段落/解释元素。 然而,一些解释可能分为多段。

我不确定如何编写解析器来读取此文件。

E.g。输出enter image description here

PS:我可以处理格式化问题。只需要有效地解析文档的想法。由于每个XML可以大约2-3MB。

1 个答案:

答案 0 :(得分:1)

这个答案值得1000分

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication2
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            List<Topic> topics = new List<Topic>();
            XmlReader reader = XmlReader.Create(FILENAME);
            Topic topic = null;
            while (!reader.EOF)
            {
                if(reader.Name != "paragraph")
                {
                    reader.ReadToFollowing("paragraph");
                }
                if (!reader.EOF)
                {
                    XElement paragraph = (XElement)XElement.ReadFrom(reader);
                    foreach(XElement subPara in paragraph.Elements())
                    {
                        switch(subPara.Name.LocalName)
                        {
                            case "text" :
                                topic = new Topic();
                                topics.Add(topic);
                                topic.title = (string)subPara;
                                break;
                            case "page":
                                topic.page = (int?)subPara;
                                break;
                            default:
                                KeyValuePair<string, string> newPara = new KeyValuePair<string, string>(
                                    subPara.Name.LocalName,
                                    (string)subPara
                                );
                                topic.paragraphs.Add(newPara);
                                break;
                        }
                    }
                }
            }


        }

    }
    public class Topic
    {
        public string title { get; set; }
        public int? page { get; set; }
        public List<KeyValuePair<string, string>> paragraphs { get; set; }
        public Topic()
        {
            paragraphs = new List<KeyValuePair<string, string>>();
        }
    }
 }