Question

我有一个实体属性值格式的csv文件（即，我的event_id是非唯一的并且重复 k k 相关属性的次数）：

    event_id, attribute_id, value
    1, 1, a
    1, 2, b
    1, 3, c
    2, 1, a
    2, 2, b
    2, 3, c
    2, 4, d

是否有任何方便的技巧可以将可变数量的属性（即，行）转换为列？这里的关键是输出应该是结构化数据的 m x n 表，其中 m = max（k）;使用NULL填充缺少的属性将是最佳的：

    event_id, 1, 2, 3, 4
    1, a, b, c, null
    2, a, b, c, d

我的计划是（1）将csv转换为如下所示的JSON对象：

    data = [{'value': 'a', 'id': '1', 'event_id': '1', 'attribute_id': '1'},
     {'value': 'b', 'id': '2', 'event_id': '1', 'attribute_id': '2'},
     {'value': 'a', 'id': '3', 'event_id': '2', 'attribute_id': '1'},
     {'value': 'b', 'id': '4', 'event_id': '2', 'attribute_id': '2'},
     {'value': 'c', 'id': '5', 'event_id': '2', 'attribute_id': '3'},
     {'value': 'd', 'id': '6', 'event_id': '2', 'attribute_id': '4'}]

（2）提取独特的事件ID：

    events = set()
    for item in data:
        events.add(item['event_id'])

（3）创建一个列表列表，其中每个内部列表是相应父事件的属性列表。

    attributes = [[k['value'] for k in j] for i, j in groupby(data, key=lambda x: x['event_id'])]

（4）创建一个将事件和属性结合在一起的字典：

    event_dict = dict(zip(events, attributes))

看起来像这样：

    {'1': ['a', 'b'], '2': ['a', 'b', 'c', 'd']}

我不确定如何将所有内部列表设置为相同的长度，并在必要时填充NULL个值。这似乎是需要在步骤（3）中完成的事情。此外，创建 n 列表中的 m NULL值已经超出我的想法，然后遍历每个列表并使用attribute_id填充值作为列表位置;但这似乎很笨拙。

Answer 1

你的基本想法似乎是对的，不过我会按如下方式实施：

import itertools
import csv

events = {}  # we're going to keep track of the events we read in
with open('path/to/input') as infile:
    for event, _att, val in csv.reader(infile):
        if event not in events:
            events[event] = []
        events[int(event)].append(val)  # track all the values for this event

maxAtts = max(len(v) for _k,v in events.items())  # the maximum number of attributes for any event
with open('path/to/output', 'w') as outfile):
    writer = csv.writer(outfile)
    writer.writerow(["event_id"] + list(range(1, maxAtts+1)))  # write out the header row
    for k in sorted(events):  # let's look at the events in sorted order
        writer.writerow([k] + events[k] + ['null']*(maxAtts-len(events[k])))  # write out the event id, all the values for that event, and pad with "null" for any attributes without values

在Python中展平实体 - 属性 - 值（EAV）架构

1 个答案: