在Pandas中处理之前,将字典值中的列表列表展平

时间:2017-01-21 22:25:35

标签: python pandas

问题:

如果我需要压缩列表列表,我会使用类似列表推导的内容来压缩到一个列表中:

[item for sublist in l for item in sublist]

我有一个字典,其中一些值是列表列表,我需要在导入Pandas之前将它们展平为单个列表。

当前数据:

defaultdict(list,
            {'object network fake-1': [' host 10.0.0.1'],
             'object network fake12': [' host 10.0.0.12'],
             'object network fake2': [' host 10.0.0.2 '],
             'object network fake3': [' host 10.0.0.0 255.255.255.0'],
             'object network fake4': [' host 10.0.0.4'],
             'object network fake5': [' host 10.0.0.5'],
             'object-group network prt-apps': [' network-object object fake-1',
              ' network-object object fake2',
              ' network-object object fake3',
              ' network-object object fake121'],
             'object-group network prt-apps2': [' network-object object fake4',
              ' group-object prt-apps',
              [' network-object object fake-1',
               ' network-object object fake2',
               ' network-object object fake3',
               ' network-object object fake121']],
             'object-group network prt-apps3': [' network-object object fake5',
              ' group-object prt-apps2',
              [' network-object object fake4',
               ' group-object prt-apps',
               [' network-object object fake-1',
                ' network-object object fake2',
                ' network-object object fake3',
                ' network-object object fake121']]]})

所需的数据结构:

defaultdict(list,
            {'object network fake-1': [' host 10.0.0.1'],
             'object network fake12': [' host 10.0.0.12'],
             'object network fake2': [' host 10.0.0.2 '],
             'object network fake3': [' host 10.0.0.0 255.255.255.0'],
             'object network fake4': [' host 10.0.0.4'],
             'object network fake5': [' host 10.0.0.5'],
             'object-group network prt-apps': [' network-object object fake-1',
              ' network-object object fake2',
              ' network-object object fake3',
              ' network-object object fake121'],
             'object-group network prt-apps2': [' network-object object fake4',
              ' group-object prt-apps',
               ' network-object object fake-1',
               ' network-object object fake2',
               ' network-object object fake3',
               ' network-object object fake121'],
             'object-group network prt-apps3': [' network-object object fake5',
              ' group-object prt-apps2',
               ' network-object object fake4',
               ' group-object prt-apps',
                ' network-object object fake-1',
                ' network-object object fake2',
                ' network-object object fake3',
                ' network-object object fake121']})

我为此搜索了SO,并没有看到我可以使用的示例。有没有一种简单的方法可以在字典值中展平这些“列表列表”容器?

这是我在Pandas中使用时处理其他字典结构的方式,但它不适用于上面的第一个字典:

pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in asa.iteritems() ]))

2 个答案:

答案 0 :(得分:2)

以下是我理解的工作(对于您的具体示例,这取决于列表+行为):

def unpack(l):
    j = []
    for i in l:
        if type(i) != list:
            j.append(i)
        else:
            j = j + unpack(i)
    return j

j = {}
for k, v in l.items():
    j[k] = unpack(v)

在示例中以dict为对象开始:

l = {'object network fake-1': [' host 10.0.0.1'],
     'object network fake12': [' host 10.0.0.12'],
     'object network fake2': [' host 10.0.0.2 '],
     'object network fake3': [' host 10.0.0.0 255.255.255.0'],
     'object network fake4': [' host 10.0.0.4'],
     'object network fake5': [' host 10.0.0.5'],
     'object-group network prt-apps': [' network-object object fake-1',
                                       ' network-object object fake2',
                                       ' network-object object fake3',
                                       ' network-object object fake121'],
     'object-group network prt-apps2': [' network-object object fake4',
                                        ' group-object prt-apps',
                                        [' network-object object fake-1',
                                         ' network-object object fake2',
                                         ' network-object object fake3',
                                         ' network-object object fake121']],
     'object-group network prt-apps3': [' network-object object fake5',
                                        ' group-object prt-apps2',
                                        [' network-object object fake4',
                                         ' group-object prt-apps',
                                         [' network-object object fake-1',
                                          ' network-object object fake2',
                                          ' network-object object fake3',
                                          ' network-object object fake121']]]}

你最终得到了

j = {'object network fake12': [' host 10.0.0.12'],
     'object-group network prt-apps': [' network-object object fake-1',
                                       ' network-object object fake2',
                                       ' network-object object fake3',
                                       ' network-object object fake121'],
     'object network fake-1': [' host 10.0.0.1'],
     'object network fake2': [' host 10.0.0.2 '],
     'object network fake3': [' host 10.0.0.0 255.255.255.0'],
     'object-group network prt-apps2': [' network-object object fake4',
                                        ' group-object prt-apps',
                                        ' network-object object fake-1',
                                        ' network-object object fake2',
                                        ' network-object object fake3',
                                        ' network-object object fake121'],
     'object-group network prt-apps3': [' network-object object fake5',
                                        ' group-object prt-apps2',
                                        ' network-object object fake4',
                                        ' group-object prt-apps',
                                        ' network-object object fake-1',
                                        ' network-object object fake2',
                                        ' network-object object fake3',
                                        ' network-object object fake121'],
     'object network fake4': [' host 10.0.0.4'],
     'object network fake5': [' host 10.0.0.5']}

答案 1 :(得分:0)

作为原帖的后续跟进。我设法解决了这个问题,并在以下生成器函数的帮助下展平了字典中的列表:

取自here

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

在词典中使用它如下给出了所需的输出:

asa = {k: list(flatten(v)) for k, v in asa.items()} 

请注意 Python 3 的此功能的另一个版本可以通过上面的链接找到。