我有什么:
一个大型XML文件@近百万行的内容。内容示例:
<etc35yh3 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc123 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc15y etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
^重复900k左右的行(当然内容改变)
我需要什么:
在XML文件中搜索"<etc123"
。一旦找到,将该行及其下面的所有行移动(写入)到单独的XML文件。
为搜索部分使用File.ReadAllLines等方法是否明智?你们对写作部分的建议是什么?就我所知,逐行不是一个选择,因为它需要太长时间。
答案 0 :(得分:4)
为了完全丢弃搜索字符串上方的内容,我不会使用File.ReadAllLines,因为它会将整个文件加载到内存中。尝试File.Open并将其包装在StreamReader中。在StreamReader.ReadLine上循环,然后开始写入新的StreamWriter,或在底层文件流上执行字节复制。
下面列出了单独使用StreamWriter / StreamReader的示例。
//load the input file
//open with read and sharing
using (FileStream fsInput = new FileStream("input.txt",
FileMode.Open, FileAccess.Read, FileShare.Read))
{
//use streamreader to search for start
var srInput = new StreamReader(fsInput);
string searchString = "two";
string cSearch = null;
bool found = false;
while ((cSearch = srInput.ReadLine()) != null)
{
if (cSearch.StartsWith(searchString, StringComparison.CurrentCultureIgnoreCase)
{
found = true;
break;
}
}
if (!found)
throw new Exception("Searched string not found.");
//we have the data, write to a new file
using (StreamWriter sw = new StreamWriter(
new FileStream("out.txt", FileMode.OpenOrCreate, //create or overwrite
FileAccess.Write, FileShare.None))) // write only, no sharing
{
//write the line that we found in the search
sw.WriteLine(cSearch);
string cline = null;
while ((cline = srInput.ReadLine()) != null)
sw.WriteLine(cline);
}
}
//both files are closed and complete
答案 1 :(得分:3)
您可以使用LINQ2XML进行复制
XElement doc=XElement.Load("yourXML.xml");
XDocument newDoc=new XDocument();
foreach(XElement elm in doc.DescendantsAndSelf("etc123"))
{
newDoc.Add(elm);
}
newDoc.Save("yourOutputXML.xml");
答案 2 :(得分:0)
您可以一次执行一行...如果检查每行的内容,则不会使用read结束。
FileInfo file = new FileInfo("MyHugeXML.xml");
FileInfo outFile = new FileInfo("ResultFile.xml");
using(FileStream write = outFile.Create())
using(StreamReader sr = file.OpenRead())
{
bool foundit = false;
string line;
while((line = sr.ReadLine()) != null)
{
if(foundit)
{
write.WriteLine(line);
}
else if (line.Contains("<etc123"))
{
foundit = true;
}
}
}
请注意,根据您的要求,此方法可能无法生成有效的XML。