我有一个应用程序,它使用DataSet.WriteXML导出数据,使用DataSet.ReadXML导入数据。在导入过程中,我需要更改某些主键作为应用程序逻辑的一部分。
当有超过500K的记录时,它会写入XML并成功读取XML。一旦我更改主键,它会等待一段时间并抛出OutOfMemory异常。我相信的原因是,它必须做很多级联更新。我在主键更改期间尝试了BeginEdit和EndEdit,但在这种情况下仍然在EndEdit中失败。
据我所知,DataSet还将一些以前的数据保存在内存中。有没有办法以一种消耗最小记忆的方式优化DataSet更新操作?
答案 0 :(得分:1)
如果您需要更多控制权,那么您需要删除数据集为您提供的一些功能。减少由级联引起的内存的一种方法是简单的Do not Cascade。使用表模式手动更新表ID。
您的想法是,您可以控制更新哪些行,随时更改AcceptChanges,强制GC更新或您可能想要控制的任何其他内容。
我创建了一个简单的测试场景,显示了我的意思:
架构:
<?xml version="1.0"?>
<xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="Planet">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Continent">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="PlanetID" type="xs:int" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Country">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="ContinentID" type="xs:int" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="County">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="CountryID" type="xs:int" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="City">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="CountyID" type="xs:int" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Street">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="CityID" type="xs:int" minOccurs="0" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="People">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="StreetID" type="xs:int" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Job">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="PeopleID" type="xs:int" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Pets">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:int" />
<xs:element name="PeopleID" type="xs:int" minOccurs="0" />
<xs:element name="Name" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:unique name="Constraint1">
<xs:selector xpath=".//Planet" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="Continent_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//Continent" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="Country_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//Country" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="County_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//County" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="City_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//City" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="Street_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//Street" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="People_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//People" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="Job_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//Job" />
<xs:field xpath="ID" />
</xs:unique>
<xs:unique name="Pets_Constraint1" msdata:ConstraintName="Constraint1">
<xs:selector xpath=".//Pets" />
<xs:field xpath="ID" />
</xs:unique>
<xs:keyref name="Relation8" refer="People_Constraint1">
<xs:selector xpath=".//Pets" />
<xs:field xpath="PeopleID" />
</xs:keyref>
<xs:keyref name="Relation7" refer="People_Constraint1">
<xs:selector xpath=".//Job" />
<xs:field xpath="PeopleID" />
</xs:keyref>
<xs:keyref name="Relation6" refer="Street_Constraint1">
<xs:selector xpath=".//People" />
<xs:field xpath="StreetID" />
</xs:keyref>
<xs:keyref name="Relation5" refer="City_Constraint1">
<xs:selector xpath=".//Street" />
<xs:field xpath="CityID" />
</xs:keyref>
<xs:keyref name="Relation4" refer="County_Constraint1">
<xs:selector xpath=".//City" />
<xs:field xpath="CountyID" />
</xs:keyref>
<xs:keyref name="Relation3" refer="Country_Constraint1">
<xs:selector xpath=".//County" />
<xs:field xpath="CountryID" />
</xs:keyref>
<xs:keyref name="Relation2" refer="Continent_Constraint1">
<xs:selector xpath=".//Country" />
<xs:field xpath="ContinentID" />
</xs:keyref>
<xs:keyref name="Relation1" refer="Constraint1">
<xs:selector xpath=".//Continent" />
<xs:field xpath="PlanetID" />
</xs:keyref>
</xs:element>
</xs:schema>
一些生成测试用例的代码
private void CreateRows(Int32 MaxBaseRows, Int32 MaxChildRows)
{
dataSet1.Clear();
Int32 RowCount = 0;
Random R = new Random();
foreach (DataTable DT in dataSet1.Tables)
{
Int32 NewCount = R.Next(1, MaxBaseRows);
foreach (var FK in DT.Constraints.OfType<ForeignKeyConstraint>())
{
NewCount = NewCount * R.Next(1, MaxChildRows);
}
for (int i = 0; i < NewCount; i++)
{
DataRow DR = DT.NewRow();
foreach (DataColumn DC in DT.Columns)
{
if (DC.ColumnName == "ID")
{
DR[DC] = DT.Rows.Count;
}
else if (DC.DataType == typeof(Int32))
{
Boolean ValueSet = false;
foreach (var FK in DT.Constraints.OfType<ForeignKeyConstraint>())
{
if (FK.Columns.Contains(DC))
{
DR[DC] = R.Next(0, FK.RelatedTable.Rows.Count);
ValueSet = true;
}
}
if (!ValueSet)
{
DR[DC] = R.Next(0, 10000);
}
}
else if (DC.DataType == typeof(String))
{
DR[DC] = String.Format("{0}{1}", DT.TableName, DT.Rows.Count);
}
}
DT.Rows.Add(DR);
RowCount++;
}
}
label19.Text = RowCount.ToString();
dataSet1.AcceptChanges();
}
private void UpdateUsingCascade()
{
EnableRelations();
GC.Collect();
long Mem = System.GC.GetTotalMemory(false);
if (dataSet1.Tables["Planet"].Rows.Count > 0)
{
dataSet1.Tables["Planet"].Rows[0]["ID"] = new Random().Next(BaseRowCount, BaseRowCount + 10);
}
Mem = System.GC.GetTotalMemory(false) - Mem;
DataSet ds = dataSet1.GetChanges();
Int32 Changes = ds.Tables.OfType<DataTable>().Sum(DT => DT.Rows.Count);
label19.Text = Changes.ToString();
label21.Text = Mem.ToString();
dataSet1.AcceptChanges();
}
private void UpdateManually()
{
DisableRelations();
GC.Collect();
long Mem = System.GC.GetTotalMemory(false);
DataTable DT = dataSet1.Tables["Planet"];
Int32 ChangeCount = 0;
if (DT.Rows.Count > 0)
{
DataColumn DC = DT.Columns["ID"];
Int32 oldValue = Convert.ToInt32(DT.Rows[0][DC]);
DT.Rows[0][DC] = new Random().Next(BaseRowCount + 20,BaseRowCount + 30);
Int32 newValue = Convert.ToInt32(DT.Rows[0][DC]);
foreach (DataRelation Relation in DT.ChildRelations)
{
if (Relation.ParentColumns.Contains(DC))
{
foreach (DataColumn CC in Relation.ChildColumns)
{
foreach (DataRow DR in Relation.ChildTable.Rows)
{
if (Convert.ToInt32(DR[CC]) == oldValue)
{
DR[CC] = newValue;
ChangeCount++;
dataSet1.AcceptChanges();
GC.Collect();
}
}
}
}
}
}
Mem = System.GC.GetTotalMemory(false) - Mem;
label20.Text = ChangeCount.ToString();
label22.Text = Mem.ToString();
dataSet1.AcceptChanges();
}
private void EnableRelations()
{
dataSet1.EnforceConstraints = true;
foreach (DataRelation Relation in dataSet1.Relations)
{
Relation.ChildKeyConstraint.UpdateRule = Rule.Cascade;
}
}
private void DisableRelations()
{
dataSet1.EnforceConstraints = false;
foreach (DataRelation Relation in dataSet1.Relations)
{
Relation.ChildKeyConstraint.UpdateRule = Rule.None;
}
}
答案 1 :(得分:0)
SHCJ - 你应该使用BufferedStream
:
DataSet dataSet = new DataSet();
FileStream fileStream = File.OpenRead(pathToYourFile);
BufferedStream bufferedStream = new BufferedStream(fileStream);
dataSet.ReadXml(bufferedStream);
更新
请为您的写作操作尝试一下:
using (XmlWriter xmlWriter = XmlWriter.Create(_pathToYourFile))
{
/* write oprations */
}
答案 2 :(得分:0)
试试这个:
try
{
//Logic to load your file
var xelmOriginal = new XElement("Root");
for (int i = 0; i < 500000; i++)
{
var item = new XElement("Item");
item.SetAttributeValue("id", i);
xelmOriginal.Add(item);
}
// Logic to transform each element
var xelmRootTransformed = new XElement("Root");
foreach (var element in xelmOriginal.Elements())
{
var transformedItem =
new XElement("Transformed",
element.
Attributes()
.Single(x => x.Name.LocalName.Equals("id")));
xelmRootTransformed.Add(transformedItem);
}
//Logic to save your transformed file
}catch(Exception e)
{
Console.WriteLine("Failed");
return;
}
Console.WriteLine("Success");
这里的关键点是你将输入和输出分开。即你不转换文件并立即写入文件;你搞砸了你的枚举。
相反,一次读取一个元素的文件,并一次写入一个临时输出元素;从理论上讲,你只会有一个活着的元素活跃起来。
答案 3 :(得分:0)
DataSet是智能野兽。它们不仅可以读取/写入/保持/过滤数据,而且还可以进行更改跟踪,因此以后的更新/写入/删除更快(使用数据库时,而不仅仅是XML文件)。
可能已经发生过,您的DataSet已启用更改跟踪,这将迫使它始终记住当前数据的内容,以及之前数据的外观,以及新数据与旧数据的关系那些。如果您只是将DataSet保留为当前工作负载的“容器”,则不需要缓存/更改跟踪 - 只需将其关闭即可。我的意思是,如果可能的话 - 我现在不记得是否也不知道如何做到。但是,我非常确定您可以通过调用.AcceptChanges()或通过调用旧DS并为要加载的每批新数据创建新DS来刷新更改。后者当然对于在当前批次的顺序更新期间抛出的OOM没有帮助。如果在第一次PK更新时抛出OOM,AcceptChanges无法帮助。只有在一次完整操作结束后才能“接受”更改,即便如此,当您可以发布时也不会“同时”。但是,如果在几次更改PK之后抛出OOM,那么在每次更改后调用AcceptChanges,或者在每次调用之后调用AcceptChanges - 可能会有所帮助。
请注意我猜。您的DS未连接到数据库,因此默认情况下,更改跟踪可能会关闭。但我怀疑,我记得即使对于XML文件,您也可以要求DS转储数据以及更改日志。我认为它默认开启。