我有一个带有名称,长度和IP地址的对象列表。每当名称和长度相同时,我都希望将它们组合在一起,将IP地址列表连接在一起。
也就是说,给定以下JSON输入:
{
"Localfiles": [{
"IPAddress": ["217.120.103.158"],
"FileLength": 7911088,
"FileName": "desktop.jpeg"
}, {
"IPAddress": ["217.120.103.158"],
"FileLength": 7924192,
"FileName": "Snelleplanga.mp4"
}, {
"IPAddress": ["217.120.103.158"],
"FileLength": 282,
"FileName": "desktop.ini"
}, {
"IPAddress": ["133.234.44.122"],
"FileLength": 7911088,
"FileName": "desktop.jpeg"
}]
}
...长度为desktop.jpeg
的{{1}}文件出现了两次,具有两个不同的IP地址。在输出中,应将它们合并,如下所示:
7911088
我目前的尝试如下:
{
"Localfiles": [{
"IPAddress": ["217.120.103.158","133.234.44.122"],
"FileLength": 7911088,
"FileName": "desktop.jpeg"
}, {
"IPAddress": ["217.120.103.158"],
"FileLength": 7924192,
"FileName": "Snelleplanga.mp4"
}, {
"IPAddress": ["217.120.103.158"],
"FileLength": 282,
"FileName": "desktop.ini"
}]
}
但是,这实际上并没有完成预期的操作。如何实现我的意图?
答案 0 :(得分:1)
用于此目的的合理数据结构是从(文件名,长度)元组到IP地址集的映射:
import collections
def collate(data):
addresses=collections.defaultdict(set)
for item in data:
addresses[(item['FileName'], item['FileLength'])] |= set(item['IPAddress'])
return addresses
输出类似于以下内容:
>>> import json
>>> collate(json.loads(jsonstring)['Localfiles'])
defaultdict(<type 'set'>, {(u'Snelleplanga.mp4', 7924192): set([u'217.120.103.158']), (u'desktop.ini', 282): set([u'217.120.103.158']), (u'desktop.jpeg', 7911088): set([u'217.120.103.158', u'133.234.44.122'])})
如果要将其转换回原始结构,请轻松完成:
def decollate(data):
retval = []
for (k,v) in data.iteritems():
(file_name, file_length) = k
retval.append({
'FileName': file_name,
'FileLength': file_length,
'IPAddress': list(v)
})
return retval
...示例输出:
>>> from pprint import pprint
>>> pprint(decollate(collate(json.loads(jsonstring)['Localfiles'])))
[{'FileLength': 7924192,
'FileName': u'Snelleplanga.mp4',
'IPAddress': [u'217.120.103.158']},
{'FileLength': 282,
'FileName': u'desktop.ini',
'IPAddress': [u'217.120.103.158']},
{'FileLength': 7911088,
'FileName': u'desktop.jpeg',
'IPAddress': [u'217.120.103.158', u'133.234.44.122']}]
答案 1 :(得分:1)
使用熊猫的解决方案:
import json
import pandas as pd
j = json.loads(jsonstring)
df = pd.DataFrame(j['Localfiles'])
df1 = df[df.duplicated(['FileLength', 'FileName'], keep=False)].groupby(['FileLength', 'FileName'])['IPAddress'].apply(lambda x: x.sum()).reset_index()
df2 = df.drop_duplicates(['FileLength', 'FileName'], keep=False)
df = pd.concat([df1, df2])
output_json = json.dumps(list(df.T.to_dict().values()))
输出JSON:
'[{'FileLength': 7911088,
'FileName': 'desktop.jpeg',
'IPAddress': ['217.120.103.158', '133.234.44.122']},
{'FileLength': 7924192,
'FileName': 'Snelleplanga.mp4',
'IPAddress': ['217.120.103.158']},
{'FileLength': 282,
'FileName': 'desktop.ini',
'IPAddress': ['217.120.103.158']}]'
答案 2 :(得分:0)
一个简单的解决方案:
dtmp={}
for d in jsonstring["Localfiles"]:
ip= d["IPAddress"][0]
key= (d["FileName"],d["FileLength"])
dtmp.setdefault(key,[]).append(ip)
lrslt=[ {"IPAddress":ip,"FileLength":lth,"FileName":fname} for (fname,lth),ip in dtmp.items() ]
drslt={"Localfiles":lrslt}
print(drslt)