成员匹配时将列表中的词典合并

时间:2018-06-21 17:34:04

标签: python

我有一个带有名称,长度和IP地址的对象列表。每当名称和长度相同时,我都希望将它们组合在一起,将IP地址列表连接在一起。

也就是说,给定以下JSON输入:

{
    "Localfiles": [{
        "IPAddress": ["217.120.103.158"],
        "FileLength": 7911088,
        "FileName": "desktop.jpeg"
    }, {
        "IPAddress": ["217.120.103.158"],
        "FileLength": 7924192,
        "FileName": "Snelleplanga.mp4"
    }, {
        "IPAddress": ["217.120.103.158"],
        "FileLength": 282,
        "FileName": "desktop.ini"
    }, {
        "IPAddress": ["133.234.44.122"],
        "FileLength": 7911088,
        "FileName": "desktop.jpeg"
    }]
}

...长度为desktop.jpeg的{​​{1}}文件出现了两次,具有两个不同的IP地址。在输出中,应将它们合并,如下所示:

7911088

我目前的尝试如下:

{
    "Localfiles": [{
        "IPAddress": ["217.120.103.158","133.234.44.122"],
        "FileLength": 7911088,
        "FileName": "desktop.jpeg"
    }, {
        "IPAddress": ["217.120.103.158"],
        "FileLength": 7924192,
        "FileName": "Snelleplanga.mp4"
    }, {
        "IPAddress": ["217.120.103.158"],
        "FileLength": 282,
        "FileName": "desktop.ini"
    }]
}

但是,这实际上并没有完成预期的操作。如何实现我的意图?

3 个答案:

答案 0 :(得分:1)

用于此目的的合理数据结构是从(文件名,长度)元组到IP地址集的映射:

import collections

def collate(data):
    addresses=collections.defaultdict(set)
    for item in data:
        addresses[(item['FileName'], item['FileLength'])] |= set(item['IPAddress'])
    return addresses

输出类似于以下内容:

>>> import json
>>> collate(json.loads(jsonstring)['Localfiles'])
defaultdict(<type 'set'>, {(u'Snelleplanga.mp4', 7924192): set([u'217.120.103.158']), (u'desktop.ini', 282): set([u'217.120.103.158']), (u'desktop.jpeg', 7911088): set([u'217.120.103.158', u'133.234.44.122'])})

如果要将其转换回原始结构,请轻松完成:

def decollate(data):
    retval = []
    for (k,v) in data.iteritems():
        (file_name, file_length) = k
        retval.append({
            'FileName': file_name,
            'FileLength': file_length,
            'IPAddress': list(v)
        })
    return retval

...示例输出:

>>> from pprint import pprint
>>> pprint(decollate(collate(json.loads(jsonstring)['Localfiles'])))
[{'FileLength': 7924192,
  'FileName': u'Snelleplanga.mp4',
  'IPAddress': [u'217.120.103.158']},
 {'FileLength': 282,
  'FileName': u'desktop.ini',
  'IPAddress': [u'217.120.103.158']},
 {'FileLength': 7911088,
  'FileName': u'desktop.jpeg',
  'IPAddress': [u'217.120.103.158', u'133.234.44.122']}]

答案 1 :(得分:1)

使用熊猫的解决方案:

import json
import pandas as pd

j = json.loads(jsonstring)
df = pd.DataFrame(j['Localfiles'])

df1 = df[df.duplicated(['FileLength', 'FileName'], keep=False)].groupby(['FileLength', 'FileName'])['IPAddress'].apply(lambda x: x.sum()).reset_index()    
df2 = df.drop_duplicates(['FileLength', 'FileName'], keep=False)    
df = pd.concat([df1, df2])

output_json = json.dumps(list(df.T.to_dict().values()))

输出JSON:

'[{'FileLength': 7911088,
  'FileName': 'desktop.jpeg',
  'IPAddress': ['217.120.103.158', '133.234.44.122']},
 {'FileLength': 7924192,
  'FileName': 'Snelleplanga.mp4',
  'IPAddress': ['217.120.103.158']},
 {'FileLength': 282,
  'FileName': 'desktop.ini',
  'IPAddress': ['217.120.103.158']}]'

答案 2 :(得分:0)

一个简单的解决方案:

dtmp={}
for d in jsonstring["Localfiles"]:
    ip= d["IPAddress"][0]
    key= (d["FileName"],d["FileLength"])
    dtmp.setdefault(key,[]).append(ip)

lrslt=[ {"IPAddress":ip,"FileLength":lth,"FileName":fname} for (fname,lth),ip in dtmp.items() ]
drslt={"Localfiles":lrslt}
print(drslt)