Question

我在 data 中有以下 JSON 字符串。我希望它看起来像下面的预期结果

import json
import pandas as pd

data = [{'useEventValue': True,
  'eventConditions': [{'type': 'CATEGORY',
    'matchType': 'EXACT',
    'expression': 'ABC'},
   {'type': 'ACTION',
    'matchType': 'EXACT',
    'expression': 'DEF'},
   {'type': 'LABEL', 'matchType': 'REGEXP', 'expression': 'GHI|JKL'}]}]

预期结果：

<头>

	Category_matchType	Category_expression	Action_matchType	Action_expression	Label_matchType	Label_expression
0	精确	ABC	精确	DEF	REGEXP	GHI\|JKL

我尝试过的：

This question 类似，但我没有像 OP 那样使用索引。按照这个例子，我尝试使用 json_normalize，然后使用各种形式的 melt、stack、unstack、pivot 等。但是必须更简单的方法！

# this bit of code produces the below result where I can start using reshaping functions to get to what I need but it seems messy
df = pd.json_normalize(data, 'eventConditions')

<头>

	类型	匹配类型	表达
0	类别	精确	ABC
1	行动	精确	DEF
2	标签	REGEXP	GHI\|JKL

Answer 1

我们可以使用 json_normalize 将 json 数据读取为 pandas 数据帧，然后使用 stack 后跟 unstack 来重塑数据帧

df = pd.json_normalize(data, 'eventConditions')
df = df.set_index([df.groupby('type').cumcount(), 'type']).stack().unstack([1, 2])
df.columns = df.columns.map('_'.join)

  CATEGORY_matchType CATEGORY_expression ACTION_matchType ACTION_expression LABEL_matchType LABEL_expression
0              EXACT                 ABC            EXACT               DEF          REGEXP          GHI|JKL

Answer 2

如果你的数据不是太大，你可以先处理json数据，然后像这样创建一个数据框：

import pandas as pd
import json

data = [{'useEventValue': True,
  'eventConditions': [{'type': 'CATEGORY',
    'matchType': 'EXACT',
    'expression': 'ABC'},
   {'type': 'ACTION',
    'matchType': 'EXACT',
    'expression': 'DEF'},
   {'type': 'LABEL', 'matchType': 'REGEXP', 'expression': 'GHI|JKL'}]}]

new_data = {}
for i in data:
    for event in i['eventConditions']:
        for key in event.keys():
            if key != 'type':
                col_name = event['type'] + '_' + key
                new_data[col_name] = [event[key]] if col_name not in new_data else new_data[col_name].append(event[key]) 
              
                
df = pd.DataFrame(new_data)
df

刚刚找到了一种只使用 Pandas 的方法：

df = pd.json_normalize(data, 'eventConditions')

df = df.melt(id_vars=[('type')])
df['type'] = df['type'] + '_' + df['variable']
df.drop(columns=['variable'], inplace=True)
df.set_index('type', inplace=True)
df = df.T

展平和塑造 JSON 数据帧

2 个答案: