基于模板的XML到CSV转换

时间:2016-06-21 15:31:11

标签: c# xml linq csv xml-parsing

我是C#的新手,我知道这对每个人来说都是一项非常艰巨而具有挑战性的任务。我有一个案例,我需要将基于模板的XML转换为CSV。我在下面列出了模板,示例XML和预期CSV

我的模板将包含一些列,这些列应该是输出CSV中的标题。我们希望将模板中的每一列与XML匹配,并检查它是否存在。如果XML中存在模板列值,则添加到CSV中,如果不存在,则将其添加为空,如下例所示。

同样在给定的XML示例中,它有三种类型的行

  1. TRX
  2. TRXR
  3. TRXC
  4. 我们必须考虑前两种类型的行。我们必须忽略XML中的 TRXC 类型行。

      

    给定模板

    var templeteList = new List<string> //Consider this as my template
            {
               "ID",
               "Name",
               "Address",
               "Phone",
               "Email",
               "Gender"     
            };
    
      

    示例XML

    <?xml version="1.0" encoding="utf-8"?>
    <StudentXML>
        <TRX ID="2" Name="Smita" Address="Pune" Gender="F" Phone="987654321"/>
        <TRX ID="2" Name="Ram"  Phone="3554321" Email="ram@mail.com" />
        <TRX ID="1" Name="John" Address="Mumbai" Phone="NULL" Email="John@mail.com" Gender="M" />
        <TRXR ID="3" Name="NULL" Address="Mumbai" Phone="121212" Email="Don@mail.com" Gender="M" />
        <TRXC ID="3" Name="Prem" Address="Mumbai" Phone="121212" Email="Prem@mail.com" Gender="M"/>
    </StudentXML>
    
      

    预期产出

    "ID", "Name",   "Address",    "Phone",           "Email",          "Gender" 
    "2",   "Smita",  "Pune",      "987654321",       "NULL",            "F"
    "2",   "Ram",    "NULL",      "3554321",         "ram@mail.com",    "NULL"
    "1",   "John",   "Mumbai",    "NULL",            "ohn@mail.com",    "M"
    "3",   "NULL",   "Mumbai",    "121212",          "Don@mail.com",    "M"
    

    我尝试使用XML Reader,但是将XML转换为CSV需要更长时间才能获得60万行XML。

     var dataSet = new DataSet();
     dataSet.ReadXml("XML File Name");//This line takes to much longer
    

    如果有人能从这里帮助我,我会非常感激。

3 个答案:

答案 0 :(得分:1)

您可以使用Linq to XML获取信息,然后处理它以在csv中转换它,如下所示:

XElement students= XElement.Load("YourXml.xml");
string csv =
    (from el in students.Elements()
     where el.Name!="TRXC"
     select
        String.Format("{0},{1},{2},{3},{4},{5},{6}",
            (string)el.Attribute("ID"),
            (string)el.Attribute("Name"),
            (string)el.Attribute("Address"),
            (string)el.Attribute("Phone"),
            (string)el.Attribute("Email"),
            (string)el.Attribute("Gender"),
            Environment.NewLine
        )
    )
    .Aggregate(
        new StringBuilder(),
        (sb, s) => sb.Append(s),
        sb => sb.ToString()
    );
 string header="ID,"+""+"Name,"+"Address,"+"Phone,"+"Email,"+"Gender,"+ Environment.NewLine;
 File.WriteAllText("yourCSV.csv", header+csv);

StringBuilder将帮助您有效地构建结果。如果您尝试逐个元素连接,那么每次应用该操作时都会创建一个新的string,这可能会因元素数量而影响您的性能。我建议您阅读此Jon Skeet's article了解更多详情

更新1

您也可以这样做来创建标题:

var templeteList = new List<string> //Consider this as my template
        {
           "ID",
           "Name",
           "Address",
           "Phone",
           "Email",
           "Gender",
           Environment.NewLine    
        };
var header=String.Joing(",",templateList);

更新2

要动态执行相同操作,您需要更改我之前的查询:

var templeteList = new List<string> //Consider this as my template
        {
           "ID",
           "Name",
           "Address",
           "Phone",
           "Email",
           "Gender"     
        };
string csv =students.Elements()
                    .Where(el=>el.Name!="TRXC")
                    .Select(el=>{
                                  string result="";
                                  for(int i=0; i<templeteList.Count;i++)
                                  {
                                    result+=((string)el.Attribute(templateList[i])) +",";
                                  }
                                  result+= Environment.NewLine;
                                  return result;
                                })
                    .Aggregate(new StringBuilder(),
                               (sb, s) => sb.Append(s),
                                sb => sb.ToString()
                               );

答案 1 :(得分:1)

XmlReader是解析XML的最快方法。您没有使用XmlReader显示您的代码,但您可能做错了什么。但是,我敢打赌,这种方法是最快的。

试试这段代码:

var templateList = new List<string> { "ID", "Name", "Address", "Phone", "Email", "Gender" };

using (var xmlReader = XmlReader.Create("test.xml"))
using (var csvWriter = new StreamWriter("test.csv"))
{
    csvWriter.WriteLine("\"" + string.Join("\",\"", templateList) + "\"");
    xmlReader.MoveToContent();

    while (xmlReader.Read())
    {
        if (xmlReader.NodeType == XmlNodeType.Element
            && xmlReader.Name != "TRXC")
        {
            csvWriter.Write('"');

            for (int i = 0; i < templateList.Count; i++)
            {
                if (xmlReader.MoveToAttribute(templateList[i]))
                    csvWriter.Write(xmlReader.Value);
                else
                    csvWriter.Write("NULL");

                if (i < templateList.Count - 1)
                    csvWriter.Write("\",\"");
            }
            csvWriter.WriteLine('"');
        }
    }
}

测试代码并报告结果。如果它的性能会很慢(当然?),我们可以尝试使用NameTable加快速度。

你的最终目标是什么?这很奇怪:首先解析xml,创建一个csv,然后立即解析csv,创建其他东西。

好的,看看这段代码:

IEnumerable<string[]> ParseXml()
{
    var templateList = new List<string> { "ID", "Name", "Address", "Phone", "Email", "Gender" };

    using (var xmlReader = XmlReader.Create("test.xml"))
    {
        yield return templateList.ToArray(); // header
        xmlReader.MoveToContent();

        while (xmlReader.Read())
        {
            if (xmlReader.NodeType == XmlNodeType.Element
                && xmlReader.Name != "TRXC") // exclude TRXC
            {
                string[] result = new string[templateList.Count];

                for (int i = 0; i < templateList.Count; i++)
                {
                    if (xmlReader.MoveToAttribute(templateList[i]))
                        result[i] = xmlReader.Value;
                    else
                        result[i] = "NULL";
                }
                yield return result; // each row
            }
        }
    }
}

使用:

foreach (string[] row in ParseXml())
{
    // process row
    // or
    foreach (string value in row)
    {
        // process value
    }
}

我们不创建中间csv。我们在xml解析时立即处理每个字符串。

答案 2 :(得分:0)

您可以使用MemoryStream从StreamWriter写入数据。然后使用StreamReader读取MemoryStream,如下所示,

            var templateList = new List<string> 
            {
               "ID",
               "Name",
               "Address",
               "Phone",
               "Email",
               "Gender"     
            };

            using (XmlReader xmlReader = XmlReader.Create("MyXMLFile.xml"))
            {
                using (MemoryStream memoryStream = new MemoryStream())
                {
                    using (StreamWriter csvWriter = new StreamWriter(memoryStream))
                    {
                        csvWriter.WriteLine("\"" + string.Join("\",\"", templateList) + "\"");
                        xmlReader.MoveToContent();
                        while (xmlReader.Read())
                        {
                            if (xmlReader.NodeType == XmlNodeType.Element &&
                                (xmlReader.Name == "TRX" || xmlReader.Name == "TRXR"))
                            {
                                csvWriter.Write('"');

                                for (int i = 0; i < templateList.Count; i++)
                                {
                                    csvWriter.Write(xmlReader.MoveToAttribute(templateList[i]) ? xmlReader.Value.Replace(",", string.Empty) : null);

                                    if (i < templateList.Count - 1)
                                        csvWriter.Write("\",\"");
                                }
                                csvWriter.WriteLine('"');
                            }
                        }
                        csvWriter.Flush();
                        memoryStream.Position = 0;
                        using (StreamReader streamReader = new StreamReader(memoryStream))
                        {
                            while ((streamReader.ReadLine()) != null)
                            {
                                //Parse your csv line by line
                            }
                        }
                    }
                }
            }