我在XmlWriter和Linq2Xml的帮助下制作了一些巨大的XML文件(几GB)。 此文件的类型为:
<Table recCount="" recLength="">
<Rec recId="1">..</Rec>
<Rec recId="2">..</Rec>
..
<Rec recId="n">..</Rec>
</Table>
在我写完所有内部 Rec 之前,我不知道 Table的 recCount 和 recLength 属性的值节点,所以我必须在最后将值写入这些属性。
现在我正在将所有内部 Rec 节点写入临时文件,计算表的属性值并按照我上面显示的方式编写所有内容到结果文件。 (使用所有 Rec 节点复制临时文件中的所有内容)
我想知道是否有办法修改这些属性的值而无需将内容写入另一个文件(就像我现在这样做)或将整个文档加载到内存中(这显然是不可能的,因为这些文件的大小)?
答案 0 :(得分:1)
严重评论代码。基本的想法是,在第一遍中我们写道:
git checkout lexer
然后我们回到文件的开头,我们重写前三行:
<?xml version="1.0" encoding="utf-8"?>
<Table recCount="$1" recLength="$2">
<!--Reserved space:++++++++++++++++-->
<Rec...
这里重要的“技巧”是你不能“插入”文件,你只能覆盖它。所以我们为数字“保留”了一些空格(<?xml version="1.0" encoding="utf-8"?>
<Table recCount="1000" recLength="150">
<!--Reserved space:#############-->
注释。我们有很多方法可以做到这一点......例如,在第一遍中我们可以有:
Reserved space:#############.
然后(xml-legal但丑陋):
<Table recCount=" " recLength=" ">
或者我们可以在表的<Table recCount="1000 " recLength="150 ">
之后添加空格:
>
(在<{em} {/ 1>}之后有20个空格)
然后:
<Table recCount="" recLength="">
(现在 >
之后有13个空格
或者我们可以简单地在新行上添加没有<Table recCount="1000" recLength="150">
的空格...
代码:
>
慢速.NET 3.5方式
在.NET 3.5中,<!-- -->
/ int maxRecCountLength = 10; // int.MaxValue.ToString().Length
int maxRecLengthLength = 10; // int.MaxValue.ToString().Length
int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are
// Note that the reserved space will be in the form +++++++++++++++++++
string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength);
// You have to manually open the FileStream
using (var fs = new FileStream("out.xml", FileMode.Create))
// and add a StreamWriter on top of it
using (var sw = new StreamWriter(fs, Encoding.UTF8, 4096, true))
{
// Here you write on your StreamWriter however you want.
// Note that recCount and recLength have a placeholder $1 and $2.
int recCount = 0;
int maxRecLength = 0;
using (var xw = XmlWriter.Create(sw))
{
xw.WriteWhitespace("\r\n");
xw.WriteStartElement("Table");
xw.WriteAttributeString("recCount", "$1");
xw.WriteAttributeString("recLength", "$2");
// You have to add some white space that will be
// partially replaced by the recCount and recLength value
xw.WriteWhitespace("\r\n");
xw.WriteComment("Reserved space:" + reservedSpace);
// <--------- BEGIN YOUR CODE
for (int i = 0; i < 100; i++)
{
xw.WriteWhitespace("\r\n");
xw.WriteStartElement("Rec");
string str = string.Format("Some number: {0}", i);
if (str.Length > maxRecLength)
{
maxRecLength = str.Length;
}
xw.WriteValue(str);
recCount++;
xw.WriteEndElement();
}
// <--------- END YOUR CODE
xw.WriteWhitespace("\r\n");
xw.WriteEndElement();
}
sw.Flush();
// Now we read the first lines to modify them (normally we will
// read three lines, the xml header, the <Table element and the
// <-- Reserved space:
fs.Position = 0;
var lines = new List<string>();
using (var sr = new StreamReader(fs, sw.Encoding, false, 4096, true))
{
while (true)
{
string str = sr.ReadLine();
lines.Add(str);
if (str.StartsWith("<Table"))
{
// We read the next line, the comment line
str = sr.ReadLine();
lines.Add(str);
break;
}
}
}
string strCount = XmlConvert.ToString(recCount);
string strMaxRecLength = XmlConvert.ToString(maxRecLength);
// We do some replaces for the tokens
int oldLen = lines[lines.Count - 2].Length;
lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount));
lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength));
int newLen = lines[lines.Count - 2].Length;
// Remove spaces from reserved whitespace
lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen));
// We move back to just after the UTF8/UTF16 preamble
fs.Position = sw.Encoding.GetPreamble().Length;
// And we rewrite the lines
foreach (string str in lines)
{
sw.Write(str);
sw.Write("\r\n");
}
}
想关闭基座StreamReader
,因此我必须重新打开该文件的各种时间。这有点慢。
StreamWriter
答案 1 :(得分:1)
尝试使用以下方法。
您可以将默认值设置为外部xml架构中的属性。
创建xml文档时,不要创建这些属性。这是:
var results = Regex.Matches(str, @"(?:(?<=-)-)?\d+\.\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
因此,xml看起来像这样:
int count = 5;
int length = 42;
var writerSettings = new XmlWriterSettings { Indent = true };
using (var writer = XmlWriter.Create("data.xml", writerSettings))
{
writer.WriteStartElement("Table");
for (int i = 1; i <= count; i++)
{
writer.WriteStartElement("Rec");
writer.WriteAttributeString("recId", i.ToString());
writer.WriteString("..");
writer.WriteEndElement();
}
}
现在为此文档创建一个xml架构,它将指定所需属性的默认值。
<?xml version="1.0" encoding="utf-8"?>
<Table>
<Rec recId="1">..</Rec>
<Rec recId="2">..</Rec>
<Rec recId="3">..</Rec>
<Rec recId="4">..</Rec>
<Rec recId="5">..</Rec>
</Table>
或者更容易创建如下的架构:
string ns = "http://www.w3.org/2001/XMLSchema";
using (var writer = XmlWriter.Create("data.xsd", writerSettings))
{
writer.WriteStartElement("xs", "schema", ns);
writer.WriteStartElement("xs", "element", ns);
writer.WriteAttributeString("name", "Table");
writer.WriteStartElement("xs", "complexType", ns);
writer.WriteStartElement("xs", "sequence", ns);
writer.WriteStartElement("xs", "any", ns);
writer.WriteAttributeString("processContents", "skip");
writer.WriteAttributeString("maxOccurs", "unbounded");
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteStartElement("xs", "attribute", ns);
writer.WriteAttributeString("name", "recCount");
writer.WriteAttributeString("default", count.ToString()); // <--
writer.WriteEndElement();
writer.WriteStartElement("xs", "attribute", ns);
writer.WriteAttributeString("name", "recLength");
writer.WriteAttributeString("default", length.ToString()); // <--
writer.WriteEndElement();
}
请注意变量XNamespace xs = "http://www.w3.org/2001/XMLSchema";
var schema = new XElement(xs + "schema",
new XElement(xs + "element", new XAttribute("name", "Table"),
new XElement(xs + "complexType",
new XElement(xs + "sequence",
new XElement(xs + "any",
new XAttribute("processContents", "skip"),
new XAttribute("maxOccurs", "unbounded")
)
),
new XElement(xs + "attribute",
new XAttribute("name", "recCount"),
new XAttribute("default", count) // <--
),
new XElement(xs + "attribute",
new XAttribute("name", "recLength"),
new XAttribute("default", length) // <--
)
)
)
);
schema.Save("data.xsd");
和count
的撰写 - 应该有您的数据。
生成的架构如下所示:
length
现在,在阅读xml文档时,您必须添加此架构 - 将从中获取默认属性值。
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Table">
<xs:complexType>
<xs:sequence>
<xs:any processContents="skip" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="recCount" default="5" />
<xs:attribute name="recLength" default="42" />
</xs:complexType>
</xs:element>
</xs:schema>
结果:
XElement xml;
var readerSettings = new XmlReaderSettings();
readerSettings.ValidationType = ValidationType.Schema; // <--
readerSettings.Schemas.Add("", "data.xsd"); // <--
using (var reader = XmlReader.Create("data.xml", readerSettings)) // <--
{
xml = XElement.Load(reader);
}
xml.Save(Console.Out);
Console.WriteLine();
答案 2 :(得分:0)
您可以尝试将xml文件加载到数据集中,因为这样可以更轻松地计算属性。此外,内存管理由DataSet层完成。为什么不尝试一下,让我们都知道结果。
答案 3 :(得分:0)
我认为FileStream课程对您有所帮助。看一下Read和Write方法。