Question

我正在尝试使用Python3和Pandas将dict键和值扩展到它们自己的列中。下面是一个例子。并非所有字典的项目数都相同，并且不能保证每种指标类型的键名都匹配。

我要转换此数据框：

id  metric          dicts
1   some_metric_1   {'a': 161, 'b': 121}
2   some_metric_1   {'a': 152, 'c': 4}
2   some_metric_2   {'b': 162, 'a': 83}
3   some_metric_2   {'b': 103, 'z': 69}

由此创建：

data = {'id': [1, 2, 2, 3], 'metric': ['some_metric_1', 'some_metric_1', 'some_metric_2', 'some_metric_2'], 'dicts': [{'a': 161, 'b': 121}, {'a': 152, 'c': 4}, {'b': 162, 'a': 83}, {'b': 103, 'z': 69}]}
df = pd.DataFrame.from_dict(data)

对此：

id  metric          key value
1   some_metric_1   a   161
1   some_metric_1   b   121
2   some_metric_1   a   152
2   some_metric_1   c   4
2   some_metric_2   b   162
2   some_metric_2   a   83
3   some_metric_2   b   103
3   some_metric_2   z   69

Answer 1

您可以简单地遍历DataFrame的行并提取所需的值，如下所示。

现在请记住，下面的代码假定每个键只有1个值（即没有值列表将传递给dict键）。不过，无论键数如何，它都可以工作。

final_df = pd.DataFrame()

for row in df.iterrows():
    metric = row[1][1]      # get the value in the metric column
    i = row[1][0]           # get the id value
    for key, value in row[1][2].items():
        tmp_df = pd.DataFrame({
            'id':i,
            'metric':metric,
            'key': key,
            'value': value
        }, index=[0])

        final_df = final_df.append(tmp_df) # append the tmp_df to our final df

final_df.reset_index(drop=True)  # Reset the final DF index sinze we assign index 0 to each tmp df

输出

    id  metric        key   value
0   1   some_metric_1   a   161
1   1   some_metric_1   b   121
2   1   some_metric_1   c   152
3   2   some_metric_1   a   152
4   2   some_metric_1   c   4
5   2   some_metric_2   b   162
6   2   some_metric_2   a   83
7   3   some_metric_3   b   103
8   3   some_metric_3   z   69

有关df.append()的更多信息。

Answer 2

我发现用普通的Python而不是Pandas可以解决这种类型的问题-将字典存储在DataFrame中后，很难执行那种快速的矢量化操作，这使得Pandas对于简单的数字/字符串数据。

这是我的解决方案，涉及到一些理解和sed -i "/^def$/ r numbers" letters。

zip

结果：

metrics = df['metric']
dicts = df['dicts']
ids = df['id']
metrics, ids = zip(*((m, i) for m, d, i in zip(metrics, dicts, ids) for j in range(len(d))))
keys, values = zip(*((k, v) for d in dicts for k, v in d.items()))
new_data = {'id': ids, 'metric': metrics, 'keys': keys, 'values': values}
new_df = pd.DataFrame.from_dict(new_data)

使用Pandas将字典条目扩展为行

2 个答案: