如何基于节点将XML文件拆分为多个XML文件

时间:2013-01-22 09:49:32

标签: c# xml split

我有一个XML文件如下

<?xml version="1.0>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>

  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>

  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

我想将xml文件拆分为三个。根据其节点

文件1:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt1</id>
</CustomTextBox>

文件2:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt2</id>
</CustomTextBox>

文件3:

<?xml version="1.0>
<AllControlsCount>
  <Width>0</Width>
  <id>ControlsID</id>
</AllControlsCount>

节点也是动态的,它们可能会改变。如何根据节点将此xml文件拆分为多个。如果有人知道请分享。

4 个答案:

答案 0 :(得分:8)

尝试 LinqToXml

var xDoc = XDocument.Parse(Resource1.XMLFile1); // loading source xml
var xmls = xDoc.Root.Elements().ToArray(); // split into elements

for(int i = 0;i< xmls.Length;i++)
{
    // write each element into different file
    using (var file = File.CreateText(string.Format("xml{0}.xml", i + 1)))
    {
        file.Write(xmls[i].ToString());
    }
}

它将在根元素内定义所有元素,并将其内容写入单独的文件中。

答案 1 :(得分:5)

使用Linq to Xml更简单 - 您可以使用XElement.Save方法将任何元素保存为单独的xml文件:

XDocument xdoc = XDocument.Load(path_to_xml);
int index = 0;
foreach (var element in xdoc.Root.Elements())
    element.Save(++index + ".xml");

或一行

XDocument.Load(path_to_xml).Root.Elements()
         .Select((e, i) => new { Element = e, File = ++i + ".xml" })
         .ToList().ForEach(x => x.Element.Save(x.File));

答案 2 :(得分:1)

您可以使用 XmlTextReader XmlWriter 类来完成您的工作。但您需要知道在何处开始创建新的 XML 文件。查看您的示例,您希望拆分根节点中包含的每个节点。

这意味着一旦你开始阅读XML文件,你需要确保你在根节点内,那么你需要跟踪你对XML的深入程度< / strong>,因此您可以在到达根节点中的下一个节点时关闭该文件。

请参阅此示例 - 我从file.xml读取XML并打开XML编写器。当我到达根节点中包含的第一个节点时,我开始编写元素。

我记得变量“treeDepth”中的深度,它表示XML树的结构深度。

根据当前读取的节点,我做了一个动作。 当我到达树深度为1的End元素时,这意味着我再次进入根节点,因此我关闭当前的XML文件并打开新文件。

XmlTextReader reader = new XmlTextReader ("file.xml");

XmlWriter writer = XmlWriter.Create("first_file.xml")
writer.WriteStartDocument();

int treeDepth = 0;

while (reader.Read()) 
{
    switch (reader.NodeType) 
    {
        case XmlNodeType.Element:

            //
            // Move to parsing or skip the root node
            //

            if (treeDepth > 0)
                writer.WriteStartElement(reader.Name);

            treeDepth++;


            break;
  case XmlNodeType.Text:

            //
            // Write text here
            //

            writer.WriteElementString (reader.Value);

            break;
  case XmlNodeType.EndElement:

            //
            // Close the end element, open new file
            //

            if (treeDepth == 1)
            {
                writer.WriteEndDocument();
                writer = new XmlWriter("file2.xml");
                writer.WriteStartDocument();
            }

            treeDepth--;

            break;
    }
}

writer.WriteEndDocument();

请注意,此代码并不能完全解决您的问题,只是解释了完全解决问题所需的逻辑。

有关XML读者和作者的更多帮助,请阅读以下链接:

http://support.microsoft.com/kb/307548

http://www.dotnetperls.com/xmlwriter

答案 3 :(得分:0)

我接受了Legoless的回答,并将其扩展为一个对我有用的版本,因此我正在分享它。对于我的需求,我需要拆分每个文件的多个条目,而不仅仅是原始问题中显示的每个文件的单个条目,因此这意味着我需要保留较高级别的元素,以确保生成有效的xml文件。

因此,您提供了要分割的级别以及所需的每个文件的条目数。

public class XMLFileManager
{        

    public List<string> SplitXMLFile(string fileName, int startingLevel, int numEntriesPerFile)
    {
        List<string> resultingFilesList = new List<string>();

        XmlReaderSettings readerSettings = new XmlReaderSettings();
        readerSettings.DtdProcessing = DtdProcessing.Parse;
        XmlReader reader = XmlReader.Create(fileName, readerSettings);

        XmlWriter writer = null;
        int fileNum = 1;
        int entryNum = 0;
        bool writerIsOpen = false;
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.NewLineOnAttributes = true;

        Dictionary<int, XmlNodeItem> higherLevelNodes = new Dictionary<int, XmlNodeItem>();
        int hlnCount = 0;

        string fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
        resultingFilesList.Add(fileIncrementedName);
        writer = XmlWriter.Create(fileIncrementedName, settings);
        writerIsOpen = true;
        writer.WriteStartDocument();

        int treeDepth = 0;

        while (reader.Read())
        {
            switch (reader.NodeType)
            {
                case XmlNodeType.Element:                        

                    treeDepth++;

                    if (treeDepth == startingLevel)
                    {
                        entryNum++;
                        if (entryNum == 1)
                        {                                
                            if (fileNum > 1)
                            {
                                fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
                                resultingFilesList.Add(fileIncrementedName);
                                writer = XmlWriter.Create(fileIncrementedName, settings);
                                writerIsOpen = true;
                                writer.WriteStartDocument();
                                for (int d = 1; d <= higherLevelNodes.Count; d++)
                                {
                                    XmlNodeItem xni = higherLevelNodes[d];
                                    switch (xni.XmlNodeType)
                                    {
                                        case XmlNodeType.Element:
                                            writer.WriteStartElement(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Text:
                                            writer.WriteString(xni.NodeValue);
                                            break;
                                        case XmlNodeType.CDATA:
                                            writer.WriteCData(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Comment:
                                            writer.WriteComment(xni.NodeValue);
                                            break;
                                        case XmlNodeType.EndElement:
                                            writer.WriteEndElement();
                                            break;
                                    }
                                }
                            }
                        }
                    }

                    if (writerIsOpen)
                    {
                        writer.WriteStartElement(reader.Name);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Element;
                        xni.NodeValue = reader.Name;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Text:

                    if (writerIsOpen)
                    {
                        writer.WriteString(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Text;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.CDATA:

                    if (writerIsOpen)
                    {
                        writer.WriteCData(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.CDATA;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Comment:

                    if (writerIsOpen)
                    {
                        writer.WriteComment(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Comment;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.EndElement:

                    if (entryNum == numEntriesPerFile && treeDepth == startingLevel || treeDepth==1)
                    {
                        if (writerIsOpen)
                        {
                            fileNum++;
                            writer.WriteEndDocument();
                            writer.Close();
                            writerIsOpen = false;
                            entryNum = 0;
                        }                            
                    }
                    else
                    {
                        if (writerIsOpen)
                        {
                            writer.WriteEndElement();
                        }

                        if (treeDepth < startingLevel)
                        {
                            hlnCount++;
                            XmlNodeItem xni = new XmlNodeItem();
                            xni.XmlNodeType = XmlNodeType.EndElement;
                            xni.NodeValue = string.Empty;
                            higherLevelNodes.Add(hlnCount, xni);
                        }
                    }

                    treeDepth--;

                    break;
            }
        }

        return resultingFilesList;
    }

    private string GetIncrementedFileName(string fileName, int fileNum)
    {
        return fileName.Replace(".xml", "") + "_" + fileNum + "_" + ".xml";
    }
}

public class XmlNodeItem
{        
    public XmlNodeType XmlNodeType { get; set; }
    public string NodeValue { get; set; }
}

样品用量:

int startingLevel = 2; //EMR is level 1, while the entries of CustomTextBox and AllControlsCount 
                       //are at Level 2. The question wants to split on those Level 2 items 
                       //and so this parameter is set to 2.
int numEntriesPerFile = 1;  //Question wants 1 entry per file which will result in 3 files,  
                            //each with one entry.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("before_split.xml", startingLevel, numEntriesPerFile);

针对问题中的XML文件使用结果:

文件1:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>
</EMR>

文件2:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>
</EMR>

文件3:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

另一个示例,其级别更深,每个文件显示多个条目:

int startingLevel = 4; //splitting on the 4th level down which is <ITEM>
int numEntriesPerFile = 2;//2 enteries per file. If instead you used 3, then the result 
                          //would be 3 entries in the first file and 1 entry in the second file.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("another_example.xml", startingLevel, numEntriesPerFile);

原始文件:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>  
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>      
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

结果文件:

第一个文件:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

第二个文件:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>