我正在尝试保存压力图的流数据。 基本上,我的压力矩阵定义为:
double[,] pressureMatrix = new double[e.Data.GetLength(0), e.Data.GetLength(1)];
基本上,我每10毫秒就会收到一个pressureMatrix
,我想将所有信息保存在JSON文件中以便以后重现。
首先,我要做的就是用所有用于记录的设置来写我所谓的标题:
recordedData.softwareVersion = Assembly.GetExecutingAssembly().GetName().Version.Major.ToString() + "." + Assembly.GetExecutingAssembly().GetName().Version.Minor.ToString();
recordedData.calibrationConfiguration = calibrationConfiguration;
recordedData.representationConfiguration = representationSettings;
recordedData.pressureData = new List<PressureMap>();
var json = JsonConvert.SerializeObject(csvRecordedData, Formatting.None);
File.WriteAllText(this.filePath, json);
然后,每次获得新的压力图时,我都会创建一个新线程以添加新的PressureMatrix
并重新编写文件:
var newPressureMatrix = new PressureMap(datos, DateTime.Now);
recordedData.pressureData.Add(newPressureMatrix);
var json = JsonConvert.SerializeObject(recordedData, Formatting.None);
File.WriteAllText(this.filePath, json);
大约20-30分钟后,我收到OutOfMemory异常,因为系统无法保存 recordedData 变量,因为其中的List<PressureMatrix>
太大。
我该如何处理以保存数据?我想保存24-48小时的信息。
答案 0 :(得分:5)
您的基本问题是,您将所有压力图样本保存在内存中,而不是分别编写每个样本,然后将其垃圾回收。更糟糕的是,您在两个不同的地方这样做:
您需要先将整个样本列表序列化为JSON字符串json
,然后再将其写入文件。
相反,如Performance Tips: Optimize Memory Usage中所述,在这种情况下,应直接对文件进行序列化和反序列化。有关如何执行此操作的说明,请参见this answer至 Can Json.NET serialize / deserialize to / from a stream? 以及Serialize JSON to a file。
recordedData.pressureData = new List<PressureMap>();
会累加所有压力图样本,然后在每次创建样本时都写出所有这些。
一个更好的解决方案是将每个样本编写一次而忘记,但是每个样本都必须嵌套在JSON中的某些容器对象内的要求使得如何做到这一点变得不明显。
那么,如何应对问题2?
首先,让我们如下修改数据模型,将标头数据划分为单独的类:
public class PressureMap
{
public double[,] PressureMatrix { get; set; }
}
public class CalibrationConfiguration
{
// Data model not included in question
}
public class RepresentationConfiguration
{
// Data model not included in question
}
public class RecordedDataHeader
{
public string SoftwareVersion { get; set; }
public CalibrationConfiguration CalibrationConfiguration { get; set; }
public RepresentationConfiguration RepresentationConfiguration { get; set; }
}
public class RecordedData
{
// Ensure the header is serialized first.
[JsonProperty(Order = 1)]
public RecordedDataHeader RecordedDataHeader { get; set; }
// Ensure the pressure data is serialized last.
[JsonProperty(Order = 2)]
public IEnumerable<PressureMap> PressureData { get; set; }
}
选项#1 是producer-comsumer pattern的版本。它涉及两个线程:一个用于生成PressureData
样本,另一个用于序列化RecordedData
。第一个线程将生成样本,并将其添加到传递到第二个线程的BlockingCollection<PressureMap>
集合中。然后,第二个线程将序列化BlockingCollection<PressureMap>.GetConsumingEnumerable()
作为RecordedData.PressureData
的值。
以下代码为执行此操作提供了框架:
var sampleCount = 400; // Or whatever stopping criterion you prefer
var sampleInterval = 10; // in ms
using (var pressureData = new BlockingCollection<PressureMap>())
{
// Adapted from
// https://docs.microsoft.com/en-us/dotnet/standard/collections/thread-safe/blockingcollection-overview
// https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=netframework-4.7.2
// Spin up a Task to sample the pressure maps
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < sampleCount; i++)
{
var data = GetPressureMap(i);
Console.WriteLine("Generated sample {0}", i);
pressureData.Add(data);
System.Threading.Thread.Sleep(sampleInterval);
}
pressureData.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
using (var stream = new FileStream(this.filePath, FileMode.Create))
using (var textWriter = new StreamWriter(stream))
using (var jsonWriter = new JsonTextWriter(textWriter))
{
int j = 0;
var query = pressureData
.GetConsumingEnumerable()
.Select(p =>
{
// Flush the writer periodically in case the process terminates abnormally
jsonWriter.Flush();
Console.WriteLine("Serializing item {0}", j++);
return p;
});
var recordedData = new RecordedData
{
RecordedDataHeader = recordedDataHeader,
// Since PressureData is declared as IEnumerable<PressureMap>, evaluation will be lazy.
PressureData = query,
};
Console.WriteLine("Beginning serialization of {0} to {1}:", recordedData, this.filePath);
JsonSerializer.CreateDefault(settings).Serialize(textWriter, recordedData);
Console.WriteLine("Finished serialization of {0} to {1}.", recordedData, this.filePath);
}
}))
{
Task.WaitAll(t1, t2);
}
}
}
注意:
此解决方案使用以下事实:序列化IEnumerable<T>
时,Json.NET将不具体化为列表。取而代之的是,它将充分利用惰性评估,并简单地枚举它,编写然后忘记遇到的每个项目。
第一个线程对PressureData
进行采样,并将它们添加到阻塞集合中。
第二个线程将阻塞集合包装在IEnumerable<PressureData>
中,然后将其序列化为RecordedData.PressureData
。
序列化期间,序列化程序将通过IEnumerable<PressureData>
枚举枚举,将每个流传输到JSON文件,然后继续进行下一个-有效地阻塞直到一个可用。
您将需要进行一些实验,以确保序列化线程可以“跟上”采样线程,这可能是通过在构造过程中设置BoundedCapacity
来实现的。如果没有,您可能需要采用其他策略。
PressureMap GetPressureMap(int count)
应该是您的某种方法(问题中未显示),该方法可以返回当前压力图样本。
在此技术中,JSON文件在采样会话期间保持打开状态。如果采样异常终止,则文件可能会被截断。我尝试通过定期刷新编写器来缓解此问题。
虽然数据序列化将不再需要无限制的内存量,但稍后反序列化RecordedData
会将PressureData
数组反序列化为具体的List<PressureMap>
。这可能会在下游处理期间导致内存问题。
演示小提琴#1 here。
选项#2 是将JSON文件切换为Newline Delimited JSON文件。这样的文件由用换行符分隔的JSON对象序列组成。在您的情况下,您将使第一个对象包含RecordedDataHeader
信息,随后的对象为PressureMap
类型:
var sampleCount = 100; // Or whatever
var sampleInterval = 10;
var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
// Write the header
Console.WriteLine("Beginning serialization of sample data to {0}.", this.filePath);
using (var stream = new FileStream(this.filePath, FileMode.Create))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { recordedDataHeader });
}
// Write each sample incrementally
for (int i = 0; i < sampleCount; i++)
{
Thread.Sleep(sampleInterval);
Console.WriteLine("Performing sample {0} of {1}", i, sampleCount);
var map = GetPressureMap(i);
using (var stream = new FileStream(this.filePath, FileMode.Append))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { map });
}
}
Console.WriteLine("Finished serialization of sample data to {0}.", this.filePath);
使用扩展方法:
public static partial class JsonExtensions
{
// Adapted from the answer to
// https://stackoverflow.com/questions/44787652/serialize-as-ndjson-using-json-net
// by dbc https://stackoverflow.com/users/3744182/dbc
public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
{
// Let caller dispose the underlying stream
using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
{
ToNewlineDelimitedJson(textWriter, items);
}
}
public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
{
var serializer = JsonSerializer.CreateDefault();
foreach (var item in items)
{
// Formatting.None is the default; I set it here for clarity.
using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
serializer.Serialize(writer, item);
}
// http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
}
}
// Adapted from the answer to
// https://stackoverflow.com/questions/29729063/line-delimited-json-serializing-and-de-serializing
// by Yuval Itzchakov https://stackoverflow.com/users/1870803/yuval-itzchakov
public static IEnumerable<TBase> FromNewlineDelimitedJson<TBase, THeader, TRow>(TextReader reader)
where THeader : TBase
where TRow : TBase
{
bool first = true;
using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
{
var serializer = JsonSerializer.CreateDefault();
while (jsonReader.Read())
{
if (jsonReader.TokenType == JsonToken.Comment)
continue;
if (first)
{
yield return serializer.Deserialize<THeader>(jsonReader);
first = false;
}
else
{
yield return serializer.Deserialize<TRow>(jsonReader);
}
}
}
}
}
稍后,您可以按以下方式处理换行符分隔的JSON文件:
using (var stream = File.OpenRead(filePath))
using (var textReader = new StreamReader(stream))
{
foreach (var obj in JsonExtensions.FromNewlineDelimitedJson<object, RecordedDataHeader, PressureMap>(textReader))
{
if (obj is RecordedDataHeader)
{
var header = (RecordedDataHeader)obj;
// Process the header
Console.WriteLine(JsonConvert.SerializeObject(header));
}
else
{
var row = (PressureMap)obj;
// Process the row.
Console.WriteLine(JsonConvert.SerializeObject(row));
}
}
}
注意:
这种方法看起来更简单,因为样本是逐步添加到文件末尾的,而不是插入到整个JSON容器中。
使用这种方法,可以使用有限的内存来完成序列化和下游处理。
示例文件在采样期间不会保持打开状态,因此被截断的可能性较小。
下游应用程序可能没有内置工具来处理换行符分隔的JSON。
此策略可以更简单地与您当前的线程代码集成。
演示小提琴#2 here。