如何解析 Pandas 数据帧中的 JSON 列并将新数据帧连接到原始数据帧?

时间:2021-01-28 17:34:35

标签: json python-3.x pandas merge

我有以下 df 样本:

{'id_user': {0: -8884522802746938515,
  1: -8884522802746938515,
  2: -8884522802746938515},
 'time': {0: '2021-01-01 11:10:34',
  1: '2021-01-01 11:11:48',
  2: '2021-01-01 11:12:38'},
 'data': {0: '{"fat": 4, "type": "FOOD_GENERAL", "unit": "1 mug (8 fl oz)", "title": "Cappuccino", "amount": 1.0, "protein": 4, "calories": 74, "foodType": 4, "recipeId": 7350, "servings": 1.0, "timestamp": "1609499434205", "ingredient": true, "carbohydrates": 6, "nutrientsData": {"iron": 0.19, "fiber": 0.2, "sugar": 6.41, "sodium": 50.0, "calcium": 144.0, "protein": 4.08, "fatTotal": 3.98, "vitaminA": 34.0, "potassium": 233.0, "cholesterol": 12.0, "fatSaturated": 2.273, "carbohydrates": 5.81, "energyConsumed": 74.0, "fatMonounsaturated": 1.007, "fatPolyunsaturated": 0.241}}',
  1: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "mealIndex": 2, "timestamp": "1609499508328", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}',
  2: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "timestamp": "1609499558606", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}'}}

我正在对数据列执行以下操作:

pd.json_normalize(df.data.apply(json_loads))

结果我得到了我需要的东西,但我希望它粘在原来的 df 上。 我应该合并索引上的数据帧吗?是否有另一种方法可以在一行或一次完成?

1 个答案:

答案 0 :(得分:1)

data 中的 df 列应先从 json 转换为 dict。

然后使用:

  • 方法1。当 df 转换为 dict 时使用 pd.json_normalize
  • 方法2。将 df['data'] 转换为数据帧,并合并到原点 df。
df['data'] = df['data'].map(json.loads)

# method1
dfn = pd.json_normalize(df.to_dict(orient='records'))

# method2
obj = df['data']
dfn = df.merge(pd.DataFrame(obj.tolist(), index = obj.index),
               left_index=True,
               right_index=True)