基本上我有一个大的(~1 GB)文件,其中每一行都是一个JSON对象,包含嵌套属性,其中一些值可能是对象数组。
对象示例:
{
"business_id": "b9WZJp5L1RZr4F1nxclOoQ",
"full_address": "1073 Washington Ave\nCarnegie, PA 15106",
"hours": {
"Monday": {
"close": "14:30",
"open": "06:00"
},
"Tuesday": {
"close": "14:30",
"open": "06:00"
},
"Friday": {
"close": "14:30",
"open": "06:00"
},
"Wednesday": {
"close": "14:30",
"open": "06:00"
},
"Thursday": {
"close": "14:30",
"open": "06:00"
},
"Sunday": {
"close": "12:30",
"open": "07:00"
},
"Saturday": {
"close": "12:30",
"open": "06:00"
}
},
"open": true,
"categories": ["Breakfast & Brunch", "Restaurants"],
"city": "Carnegie",
"review_count": 38,
"name": "Gab & Eat",
"neighborhoods": [],
"longitude": -80.084799799999999,
"state": "PA",
"stars": 4.5,
"latitude": 40.396744099999999,
"attributes": {
"Alcohol": "none",
"Noise Level": "average",
"Has TV": true,
"Attire": "casual",
"Ambience": {
"romantic": false,
"intimate": false,
"classy": false,
"hipster": false,
"divey": true,
"touristy": false,
"trendy": false,
"upscale": false,
"casual": true
},
"Good for Kids": true,
"Wheelchair Accessible": false,
"Delivery": false,
"Caters": true,
"BYOB": false,
"Corkage": false,
"Accepts Credit Cards": false,
"BYOB/Corkage": "yes_free",
"Take-out": true,
"Price Range": 1,
"Outdoor Seating": false,
"Takes Reservations": false,
"Waiter Service": true,
"Wi-Fi": "no",
"Order at Counter": true,
"Good For": {
"dessert": false,
"latenight": false,
"lunch": false,
"dinner": false,
"brunch": false,
"breakfast": true
},
"Parking": {
"garage": false,
"street": false,
"validated": false,
"lot": true,
"valet": false
},
"Good For Kids": true,
"Good For Groups": false
},
"type": "business"
},
如何展平并将其转换为CSV,以便我有一个包含business_id
,hours.Monday.close
,attributes.Ambience.hipster
等属性(列)的对象?
问题是并非所有对象都具有所有属性,因此我需要扫描整个文件以获取所有可能的平面属性的列表。基本上我试图模仿json2csv的功能,除了对于数组值属性,我不将它分成多个列,而是将整个数组字符串存储为CSV格式的值。
如何使用Python或.NET实现此目的?
答案 0 :(得分:1)
这很有效。可能需要更多地扩展它以钻取数组等。使用Newtonsoft的JSON库,并假设JSON字符串是对象,而不是数组或基元(或任何东西)别的)
void Main()
{
var obj = JsonConvert.DeserializeObject(jsonStr) as JObject;
var props = GetPropPaths(string.Empty, obj);
props.Dump();
}
private IEnumerable<Tuple<string, string>> GetPropPaths(string currPath, JObject obj)
{
foreach(var prop in obj.Properties())
{
var propPath = string.IsNullOrWhiteSpace(currPath) ? prop.Name : currPath + "." + prop.Name;
if (prop.Value.Type == JTokenType.Object)
{
foreach(var subProp in GetPropPaths(propPath, prop.Value as JObject))
yield return subProp;
} else {
yield return new Tuple<string, string>(propPath, prop.Value.ToString());
}
}
}
对于您的上述json,它提供以下内容: