我是C#的新手,我知道这对每个人来说都是一项非常艰巨而具有挑战性的任务。我有一个案例,我需要将基于模板的XML转换为CSV。我在下面列出了模板,示例XML和预期CSV 。
我的模板将包含一些列,这些列应该是输出CSV中的标题。我们希望将模板中的每一列与XML匹配,并检查它是否存在。如果XML中存在模板列值,则将添加到CSV中,如果不存在,则将其添加为空,如下例所示。
同样在给定的XML示例中,它有三种类型的行
我们必须考虑前两种类型的行。我们必须忽略XML中的 TRXC 类型行。
给定模板
var templeteList = new List<string> //Consider this as my template
{
"ID",
"Name",
"Address",
"Phone",
"Email",
"Gender"
};
示例XML
<?xml version="1.0" encoding="utf-8"?>
<StudentXML>
<TRX ID="2" Name="Smita" Address="Pune" Gender="F" Phone="987654321"/>
<TRX ID="2" Name="Ram" Phone="3554321" Email="ram@mail.com" />
<TRX ID="1" Name="John" Address="Mumbai" Phone="NULL" Email="John@mail.com" Gender="M" />
<TRXR ID="3" Name="NULL" Address="Mumbai" Phone="121212" Email="Don@mail.com" Gender="M" />
<TRXC ID="3" Name="Prem" Address="Mumbai" Phone="121212" Email="Prem@mail.com" Gender="M"/>
</StudentXML>
预期产出
"ID", "Name", "Address", "Phone", "Email", "Gender"
"2", "Smita", "Pune", "987654321", "NULL", "F"
"2", "Ram", "NULL", "3554321", "ram@mail.com", "NULL"
"1", "John", "Mumbai", "NULL", "ohn@mail.com", "M"
"3", "NULL", "Mumbai", "121212", "Don@mail.com", "M"
我尝试使用XML Reader,但是将XML转换为CSV需要更长时间才能获得60万行XML。
var dataSet = new DataSet();
dataSet.ReadXml("XML File Name");//This line takes to much longer
如果有人能从这里帮助我,我会非常感激。
答案 0 :(得分:1)
您可以使用Linq to XML获取信息,然后处理它以在csv中转换它,如下所示:
XElement students= XElement.Load("YourXml.xml");
string csv =
(from el in students.Elements()
where el.Name!="TRXC"
select
String.Format("{0},{1},{2},{3},{4},{5},{6}",
(string)el.Attribute("ID"),
(string)el.Attribute("Name"),
(string)el.Attribute("Address"),
(string)el.Attribute("Phone"),
(string)el.Attribute("Email"),
(string)el.Attribute("Gender"),
Environment.NewLine
)
)
.Aggregate(
new StringBuilder(),
(sb, s) => sb.Append(s),
sb => sb.ToString()
);
string header="ID,"+""+"Name,"+"Address,"+"Phone,"+"Email,"+"Gender,"+ Environment.NewLine;
File.WriteAllText("yourCSV.csv", header+csv);
StringBuilder
将帮助您有效地构建结果。如果您尝试逐个元素连接,那么每次应用该操作时都会创建一个新的string
,这可能会因元素数量而影响您的性能。我建议您阅读此Jon Skeet's article了解更多详情
您也可以这样做来创建标题:
var templeteList = new List<string> //Consider this as my template
{
"ID",
"Name",
"Address",
"Phone",
"Email",
"Gender",
Environment.NewLine
};
var header=String.Joing(",",templateList);
要动态执行相同操作,您需要更改我之前的查询:
var templeteList = new List<string> //Consider this as my template
{
"ID",
"Name",
"Address",
"Phone",
"Email",
"Gender"
};
string csv =students.Elements()
.Where(el=>el.Name!="TRXC")
.Select(el=>{
string result="";
for(int i=0; i<templeteList.Count;i++)
{
result+=((string)el.Attribute(templateList[i])) +",";
}
result+= Environment.NewLine;
return result;
})
.Aggregate(new StringBuilder(),
(sb, s) => sb.Append(s),
sb => sb.ToString()
);
答案 1 :(得分:1)
XmlReader
是解析XML的最快方法。您没有使用XmlReader
显示您的代码,但您可能做错了什么。但是,我敢打赌,这种方法是最快的。
试试这段代码:
var templateList = new List<string> { "ID", "Name", "Address", "Phone", "Email", "Gender" };
using (var xmlReader = XmlReader.Create("test.xml"))
using (var csvWriter = new StreamWriter("test.csv"))
{
csvWriter.WriteLine("\"" + string.Join("\",\"", templateList) + "\"");
xmlReader.MoveToContent();
while (xmlReader.Read())
{
if (xmlReader.NodeType == XmlNodeType.Element
&& xmlReader.Name != "TRXC")
{
csvWriter.Write('"');
for (int i = 0; i < templateList.Count; i++)
{
if (xmlReader.MoveToAttribute(templateList[i]))
csvWriter.Write(xmlReader.Value);
else
csvWriter.Write("NULL");
if (i < templateList.Count - 1)
csvWriter.Write("\",\"");
}
csvWriter.WriteLine('"');
}
}
}
测试代码并报告结果。如果它的性能会很慢(当然?),我们可以尝试使用NameTable
加快速度。
你的最终目标是什么?这很奇怪:首先解析xml,创建一个csv,然后立即解析csv,创建其他东西。
好的,看看这段代码:
IEnumerable<string[]> ParseXml()
{
var templateList = new List<string> { "ID", "Name", "Address", "Phone", "Email", "Gender" };
using (var xmlReader = XmlReader.Create("test.xml"))
{
yield return templateList.ToArray(); // header
xmlReader.MoveToContent();
while (xmlReader.Read())
{
if (xmlReader.NodeType == XmlNodeType.Element
&& xmlReader.Name != "TRXC") // exclude TRXC
{
string[] result = new string[templateList.Count];
for (int i = 0; i < templateList.Count; i++)
{
if (xmlReader.MoveToAttribute(templateList[i]))
result[i] = xmlReader.Value;
else
result[i] = "NULL";
}
yield return result; // each row
}
}
}
}
使用:
foreach (string[] row in ParseXml())
{
// process row
// or
foreach (string value in row)
{
// process value
}
}
我们不创建中间csv。我们在xml解析时立即处理每个字符串。
答案 2 :(得分:0)
您可以使用MemoryStream从StreamWriter写入数据。然后使用StreamReader读取MemoryStream,如下所示,
var templateList = new List<string>
{
"ID",
"Name",
"Address",
"Phone",
"Email",
"Gender"
};
using (XmlReader xmlReader = XmlReader.Create("MyXMLFile.xml"))
{
using (MemoryStream memoryStream = new MemoryStream())
{
using (StreamWriter csvWriter = new StreamWriter(memoryStream))
{
csvWriter.WriteLine("\"" + string.Join("\",\"", templateList) + "\"");
xmlReader.MoveToContent();
while (xmlReader.Read())
{
if (xmlReader.NodeType == XmlNodeType.Element &&
(xmlReader.Name == "TRX" || xmlReader.Name == "TRXR"))
{
csvWriter.Write('"');
for (int i = 0; i < templateList.Count; i++)
{
csvWriter.Write(xmlReader.MoveToAttribute(templateList[i]) ? xmlReader.Value.Replace(",", string.Empty) : null);
if (i < templateList.Count - 1)
csvWriter.Write("\",\"");
}
csvWriter.WriteLine('"');
}
}
csvWriter.Flush();
memoryStream.Position = 0;
using (StreamReader streamReader = new StreamReader(memoryStream))
{
while ((streamReader.ReadLine()) != null)
{
//Parse your csv line by line
}
}
}
}
}