将缩进文本转换为XML

时间:2013-02-06 18:57:08

标签: c# xml schema

我有一个基于文本的文件,其中包含代表XML文件的每个XML标记的缩进。

如何将此文本转换为C#中的示例XML? 我有点迷茫。我必须对空格进行计数并回顾列表以确定标签应何时关闭。

sampleroot                                          
  rootHeader                                        
    miscInformation                                 
        Creation                                
        DocumentIdentification                              
            Identifier                          
            Message_Type                            
            Notes                           
        StandardDocumentationIdentification                             
            Standard                            
            Version                         
    Receiver                                    
        Name                                
        lok                             
        Location                                
    Sender                                  
        Name                                
        lok2                                
    msgref                                  
        DocumentIdentifier                              
        HoldInformation                             
            Name                            
            Date                            
        ReleaseInformation                              
            Date                            
    HoldDocumentReference                                   
        AlternativeIdentifier                               
            Authority                           
            Identifier                          
        Notes                               
    ReleaseDocumentReference                                    
        AlternativeIdentifier                               
            Authority                           
            Identifier                          
        Notes       

2 个答案:

答案 0 :(得分:3)

以下代码适用于具有四个空格缩进的输入文档(请仔细查看输入文档)。这只是一个示例:当然,您可以使用制表符缩进来实现对输入文档的支持。

private static void ConvertToXml(Stream inputStream, Stream outputStream)
{
    const int oneIndentLength = 4; // One level indentation - four spaces.
    var xmlWriterSettings = new XmlWriterSettings
        {
            Indent = true
        };

    using (var streamReader = new StreamReader(inputStream))
    using (var xmlWriter = XmlWriter.Create(outputStream, xmlWriterSettings))
    {
        int previousIndent = -1; // There is no previous indent.
        string line;
        while ((line = streamReader.ReadLine()) != null)
        {
            var indent = line.TakeWhile(ch => ch == ' ').Count();
            indent /= oneIndentLength;

            var elementName = line.Trim();

            if (indent <= previousIndent)
            {
                // The indent is the same or is less than previous one - write end for previous element.
                xmlWriter.WriteEndElement();

                var indentDelta = previousIndent - indent;
                for (int i = 0; i < indentDelta; i++)
                {
                    // Return: leave the node.
                    xmlWriter.WriteEndElement();
                }
            }

            xmlWriter.WriteStartElement(elementName);

            // Save the indent of the previous line.
            previousIndent = indent;
        }
    }
}

客户代码:

using (var inputStream = File.OpenRead(@"Input file path"))
using (var outputStream = File.Create(@"Output file path"))
{
    ConvertToXml(inputStream, outputStream);
}

输入文件:

sampleroot
    rootHeader
        miscInformation
            Creation
            DocumentIdentification
                Identifier
                Message_Type
                Notes
            StandardDocumentationIdentification
                Standard
                Version
        Receiver
            Name
            lok
            Location
        Sender
            Name
            lok2
        msgref
            DocumentIdentifier
            HoldInformation
                Name
                Date
            ReleaseInformation
                Date
        HoldDocumentReference
            AlternativeIdentifier
                Authority
                Identifier
            Notes
        ReleaseDocumentReference
            AlternativeIdentifier
                Authority
                Identifier
            Notes

输出文件:

<?xml version="1.0" encoding="utf-8"?>
<sampleroot>
  <rootHeader>
    <miscInformation>
      <Creation />
      <DocumentIdentification>
        <Identifier />
        <Message_Type />
        <Notes />
      </DocumentIdentification>
      <StandardDocumentationIdentification>
        <Standard />
        <Version />
      </StandardDocumentationIdentification>
    </miscInformation>
    <Receiver>
      <Name />
      <lok />
      <Location />
    </Receiver>
    <Sender>
      <Name />
      <lok2 />
    </Sender>
    <msgref>
      <DocumentIdentifier />
      <HoldInformation>
        <Name />
        <Date />
      </HoldInformation>
      <ReleaseInformation>
        <Date />
      </ReleaseInformation>
    </msgref>
    <HoldDocumentReference>
      <AlternativeIdentifier>
        <Authority />
        <Identifier />
      </AlternativeIdentifier>
      <Notes />
    </HoldDocumentReference>
    <ReleaseDocumentReference>
      <AlternativeIdentifier>
        <Authority />
        <Identifier />
      </AlternativeIdentifier>
      <Notes />
    </ReleaseDocumentReference>
  </rootHeader>
</sampleroot>

答案 1 :(得分:1)

def count_spaces(s):
    i = 0
    for c in s:
        if c == " ":
            i += 1
        else:
            return i

lastlevel = 0
lastheader = ""
close_stack = []
for line in file:
     level = count_spaces(line)
     if level == lastlevel:
         xml += "<"+line+"/>"
     elif level > lastlevel:
         xml += "<"+line+"/>"
         close_stack.push(lastheader)
         lastheader = line
     else:
         xml += "</"+close_stack.pop()+">"