使用JSON.NET,我正在从一个大文件中读取数组中的JSON对象。 读取JSON对象时,会将其有条件地转换为目标类,并作为IEnumerable中的项返回。
我使用IEnumerable允许我从文件中“拉出”对象并在读取对象时对其进行处理,从而避免了将所有对象都读取到内存中的情况。
从CSV文件读取行时,我使用了类似的技术,其中使用CsvHelper ShouldSkipRecord()有条件地处理CSV文件中的行。
我还没有找到一种方法来过滤从数组中读取的JSON对象,因此我最终使用LINQ Where来过滤对象,然后将它们转换并添加到IEnumerable中。问题在于,Where子句将所有对象读取到内存中,从而无法实现使用IEnumerable的目的。
我知道我可以手动读取每个对象,然后对其进行处理,但是我正在寻找一种更优雅的方式来具有某种形式的回调,该回调将允许我提取记录和不需要的回调过滤器记录。< / p>
例如如何过滤CSV文件中的行:
internal static bool ShouldSkipRecord(string[] fields)
{
// Skip rows with incomplete data
// 2019-01-24 20:46:57 UTC,63165,4.43,6.23,6.80,189,-18,81.00,16.00,6.23
// 2019 - 01 - 24 20:47:40 UTC,63166,4.93,5.73,5.73,0,-20,,,5.73
if (fields.Length < 10)
return true;
// Temperature and humidity is optional, air quality is required
if (string.IsNullOrEmpty(fields[9]))
return true;
return false;
}
例如我如何过滤JSON对象:
internal static PurpleAirData Convert(Feed jsonData)
{
PurpleAirData data = new PurpleAirData()
{
TimeStamp = jsonData.CreatedAt.DateTime,
AirQuality = Double.Parse(jsonData.Field8)
};
// Temperature and humidity is optional
if (double.TryParse(jsonData.Field6, out double val))
data.Temperature = val;
if (double.TryParse(jsonData.Field7, out val))
data.Humidity = val;
return data;
}
internal static IEnumerable<PurpleAirData> Load(JsonTextReader jsonReader)
{
// Deserialize objects in parts
jsonReader.SupportMultipleContent = true;
JsonSerializer serializer = new JsonSerializer();
// Read Channel
// TODO : Add format checking
jsonReader.Read();
jsonReader.Read();
jsonReader.Read();
Channel channel = serializer.Deserialize<Channel>(jsonReader);
// Read the Feeds
jsonReader.Read();
jsonReader.Read();
// TODO : The Where results in a full in-memory iteration defeating the purpose of the streaming iteration
return serializer.Deserialize<List<Feed>>(jsonReader).Where(feed => !string.IsNullOrEmpty(feed.Field8)).Select(Convert);
}
示例JSON:
{
"channel":{
"id":622370,
"name":"AirMonitor_e81a",
"latitude":"0.0",
"longitude":"0.0",
"field1":"PM1.0 (ATM)",
"field2":"PM2.5 (ATM)",
"field3":"PM10.0 (ATM)",
"field4":"Uptime",
"field5":"RSSI",
"field6":"Temperature",
"field7":"Humidity",
"field8":"PM2.5 (CF=1)",
"created_at":"2018-11-09T00:35:34Z",
"updated_at":"2018-11-09T00:35:35Z",
"last_entry_id":65435
},
"feeds":[
{
"created_at":"2019-01-10T23:56:09Z",
"entry_id":56401,
"field1":"1.00",
"field2":"1.80",
"field3":"1.80",
"field4":"369",
"field5":"-30",
"field6":"66.00",
"field7":"59.00",
"field8":"1.80"
},
{
"created_at":"2019-01-10T23:57:29Z",
"entry_id":56402,
"field1":"1.08",
"field2":"2.44",
"field3":"3.33",
"field4":"371",
"field5":"-32",
"field6":"66.00",
"field7":"59.00",
"field8":"2.44"
},
{
"created_at":"2019-01-26T00:14:04Z",
"entry_id":64400,
"field1":"0.27",
"field2":"0.95",
"field3":"1.25",
"field4":"213",
"field5":"-27",
"field6":"72.00",
"field7":"40.00",
"field8":"0.95"
}
]
}
示例JSON:
[
{
"monthlyrainin": 0.01,
"humidityin": 42,
"eventrainin": 0,
"humidity": 29,
"maxdailygust": 20.13,
"dateutc": 1549476900000,
"battout": "1",
"lastRain": "2019-02-05T19:21:00.000Z",
"dailyrainin": 0,
"tempf": 52.2,
"winddir": 286,
"totalrainin": 0.01,
"dewPoint": 20.92,
"baromabsin": 29.95,
"hourlyrainin": 0,
"feelsLike": 52.2,
"yearlyrainin": 0.01,
"uv": 1,
"weeklyrainin": 0.01,
"solarradiation": 157.72,
"windspeedmph": 0,
"tempinf": 73.8,
"windgustmph": 0,
"battin": "1",
"baromrelin": 30.12,
"date": "2019-02-06T18:15:00.000Z"
},
{
"dewPoint": 20.92,
"tempf": 52.2,
"maxdailygust": 20.13,
"humidityin": 42,
"windspeedmph": 4.03,
"eventrainin": 0,
"tempinf": 73.6,
"feelsLike": 52.2,
"dateutc": 1549476600000,
"windgustmph": 4.92,
"hourlyrainin": 0,
"monthlyrainin": 0.01,
"battin": "1",
"humidity": 29,
"totalrainin": 0.01,
"baromrelin": 30.12,
"winddir": 314,
"lastRain": "2019-02-05T19:21:00.000Z",
"yearlyrainin": 0.01,
"baromabsin": 29.94,
"dailyrainin": 0,
"battout": "1",
"uv": 1,
"solarradiation": 151.86,
"weeklyrainin": 0.01,
"date": "2019-02-06T18:10:00.000Z"
}]
JSON.NET中是否有一种方法可以在读取对象时对其进行过滤?
答案 0 :(得分:1)
您可以做的是采用 Issues parsing a 1GB json file using JSON.NET 和 Deserialize json array stream one item at a time 的基本方法,该方法将流经数组并产生收益每一个项目;但除此之外,还可以应用where
表达式来过滤不完整的项目,或者应用select
子句将一些中间反序列化的对象(例如JObject
或DTO)转换为最终数据模型。通过在流传输期间应用where
子句,不需要的对象将永远不会添加到要反序列化的列表中,因此在流传输期间将被垃圾收集器清除。在流式传输时过滤数组内容可以在根级别,当根JSON容器是数组时进行,或者在要反序列化的数组与某些外部JSON嵌套时,作为List<T>
的一部分custom JsonConverter
的一部分
作为一个具体示例,请考虑您的第一个JSON示例。您想将其反序列化为如下所示的数据模型:
public class PurpleAirData
{
public PurpleAirData(DateTime createdAt, double airQuality)
{
this.CreatedAt = createdAt;
this.AirQuality = airQuality;
}
// Required properties
public DateTime CreatedAt { get; set; }
public double AirQuality { get; set; }
// Optional properties, thus nullable
public double? Temperature { get; set; }
public double? Humidity { get; set; }
}
public class RootObject
{
public Channel channel { get; set; } // Define this using http://json2csharp.com/
public List<PurpleAirData> feeds { get; set; }
}
为此,首先介绍以下扩展方法:
public static partial class JsonExtensions
{
public static IEnumerable<T> DeserializeArrayItems<T>(this JsonSerializer serializer, JsonReader reader)
{
if (reader.MoveToContent().TokenType == JsonToken.Null)
yield break;
if (reader.TokenType != JsonToken.StartArray)
throw new JsonSerializationException(string.Format("Current token {0} is not an array at path {1}", reader.TokenType, reader.Path));
// Process the collection items
while (reader.Read())
{
switch (reader.TokenType)
{
case JsonToken.EndArray:
yield break;
case JsonToken.Comment:
break;
default:
yield return serializer.Deserialize<T>(reader);
break;
}
}
// Should not come here.
throw new JsonReaderException(string.Format("Unclosed array at path {0}", reader.Path));
}
public static JsonReader MoveToContent(this JsonReader reader)
{
if (reader.TokenType == JsonToken.None)
reader.Read();
while (reader.TokenType == JsonToken.Comment && reader.Read())
;
return reader;
}
}
接下来,为JsonConverter
引入以下List<PurpleAirData>
:
class PurpleAirListConverter : JsonConverter
{
class PurpleAirDataDTO
{
// Required properties
[JsonProperty("created_at")]
public DateTime? CreatedAt { get; set; }
[JsonProperty("Field8")]
public double? AirQuality { get; set; }
// Optional properties
[JsonProperty("Field6")]
public double? Temperature { get; set; }
[JsonProperty("Field7")]
public double? Humidity { get; set; }
}
public override bool CanConvert(Type objectType)
{
return objectType == typeof(List<PurpleAirData>);
}
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
if (reader.MoveToContent().TokenType == JsonToken.Null)
return null;
var list = existingValue as List<PurpleAirData> ?? new List<PurpleAirData>();
var query = from dto in serializer.DeserializeArrayItems<PurpleAirDataDTO>(reader)
where dto != null && dto.CreatedAt != null && dto.AirQuality != null
select new PurpleAirData(dto.CreatedAt.Value, dto.AirQuality.Value) { Humidity = dto.Humidity, Temperature = dto.Temperature };
list.AddRange(query);
return list;
}
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}
此转换器的目的是流经"feeds"
数组,将每个JSON项反序列化到中间PurpleAirDataDTO
,检查是否存在必需的成员,然后将DTO转换为最终模型。
最后,按如下所示反序列化整个文件:
static RootObject DeserializePurpleAirDataFile(TextReader textReader)
{
var settings = new JsonSerializerSettings
{
Converters = { new PurpleAirListConverter() },
NullValueHandling = NullValueHandling.Ignore,
};
var serializer = JsonSerializer.CreateDefault(settings);
using (var reader = new JsonTextReader(textReader) { CloseInput = false })
{
return serializer.Deserialize<RootObject>(reader);
}
}
演示小提琴here。
当要过滤的数组是JSON文件中的根容器时,扩展方法JsonExtensions.DeserializeArrayItems()
可以直接使用,例如如下:
static bool IsValid(WeatherData data)
{
// Return false if certain fields are missing
// Otherwise return true;
return true;
}
static List<WeatherData> DeserializeFilteredWeatherData(TextReader textReader)
{
var serializer = JsonSerializer.CreateDefault();
using (var reader = new JsonTextReader(textReader) { CloseInput = false })
{
var query = from data in serializer.DeserializeArrayItems<WeatherData>(reader)
where IsValid(data)
select data;
return query.ToList();
}
}
注意:
nullable类型可用于跟踪反序列化过程中是否实际遇到了值类型成员。
这里是手动完成从DTO到最终数据模型的转换,但是对于更复杂的模型,可以使用类似automapper的模型。