如何解析,展平并将大型JSON文件转换为CSV文件?

时间:2015-10-22 02:30:43

标签: c# python .net json csv

基本上我有一个大的(~1 GB)文件,其中每一行都是一个JSON对象,包含嵌套属性,其中一些值可能是对象数组。

对象示例:

{
  "business_id": "b9WZJp5L1RZr4F1nxclOoQ",
  "full_address": "1073 Washington Ave\nCarnegie, PA 15106",
  "hours": {
    "Monday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Tuesday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Friday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Wednesday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Thursday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Sunday": {
      "close": "12:30",
      "open": "07:00"
    },
    "Saturday": {
      "close": "12:30",
      "open": "06:00"
    }
  },
  "open": true,
  "categories": ["Breakfast & Brunch", "Restaurants"],
  "city": "Carnegie",
  "review_count": 38,
  "name": "Gab & Eat",
  "neighborhoods": [],
  "longitude": -80.084799799999999,
  "state": "PA",
  "stars": 4.5,
  "latitude": 40.396744099999999,
  "attributes": {
    "Alcohol": "none",
    "Noise Level": "average",
    "Has TV": true,
    "Attire": "casual",
    "Ambience": {
      "romantic": false,
      "intimate": false,
      "classy": false,
      "hipster": false,
      "divey": true,
      "touristy": false,
      "trendy": false,
      "upscale": false,
      "casual": true
    },
    "Good for Kids": true,
    "Wheelchair Accessible": false,
    "Delivery": false,
    "Caters": true,
    "BYOB": false,
    "Corkage": false,
    "Accepts Credit Cards": false,
    "BYOB/Corkage": "yes_free",
    "Take-out": true,
    "Price Range": 1,
    "Outdoor Seating": false,
    "Takes Reservations": false,
    "Waiter Service": true,
    "Wi-Fi": "no",
    "Order at Counter": true,
    "Good For": {
      "dessert": false,
      "latenight": false,
      "lunch": false,
      "dinner": false,
      "brunch": false,
      "breakfast": true
    },
    "Parking": {
      "garage": false,
      "street": false,
      "validated": false,
      "lot": true,
      "valet": false
    },
    "Good For Kids": true,
    "Good For Groups": false
  },
  "type": "business"
},

如何展平并将其转换为CSV,以便我有一个包含business_idhours.Monday.closeattributes.Ambience.hipster等属性(列)的对象?

问题是并非所有对象都具有所有属性,因此我需要扫描整个文件以获取所有可能的平面属性的列表。基本上我试图模仿json2csv的功能,除了对于数组值属性,我不将它分成多个列,而是将整个数组字符串存储为CSV格式的值。

如何使用Python或.NET实现此目的?

1 个答案:

答案 0 :(得分:1)

这很有效。可能需要更多地扩展它以钻取数组等。使用Newtonsoft的JSON库,并假设JSON字符串是对象,而不是数组或基元(或任何东西)别的)

void Main()
{
    var obj = JsonConvert.DeserializeObject(jsonStr) as JObject;    
    var props = GetPropPaths(string.Empty, obj);
    props.Dump();
}

private IEnumerable<Tuple<string, string>> GetPropPaths(string currPath, JObject obj)
{
    foreach(var prop in obj.Properties())
    {
        var propPath = string.IsNullOrWhiteSpace(currPath) ? prop.Name : currPath + "." + prop.Name;

        if (prop.Value.Type == JTokenType.Object)
        {   
            foreach(var subProp in GetPropPaths(propPath, prop.Value as JObject))
                yield return subProp;
        } else {
            yield return new Tuple<string, string>(propPath, prop.Value.ToString());
        }
    }
}

对于您的上述json,它提供以下内容:

enter image description here