我有多个包含大写字母和国家/地区的JSON文件。如何从所有文件中删除重复的键值对?
我有以下JSON文件之一
{
"data": [
{
"Capital": "Berlin",
"Country": "Germany"
},
{
"Capital": "New Delhi",
"Country": "India"
},
{
"Capital": "Canberra",
"Country": "Australia"
},
{
"Capital": "Beijing.",
"Country": "China"
},
{
"Capital": "Tokyo",
"Country": "Japan"
},
{
"Capital": "Tokyo",
"Country": "Japan"
},
{
"Capital": "Berlin",
"Country": "Germany"
},
{
"Capital": "Moscow",
"Country": "Russia"
},
{
"Capital": "New Delhi",
"Country": "India"
},
{
"Capital": "Ottawa",
"Country": "Canada"
}
]
}
有很多这样的JSON文件包含重复项目。如何删除重复项目只保留第一次出现?我试过这个,但是没有用
dupes = []
for f in json_files:
with open(f) as json_data:
nations = json.load(json_data)['data']
#takes care of duplicates and stores it in dupes
dupes.append(x for x in nations if x['Capital'] in seen or seen.add(x['Capital']))
nations = [x for x in nations if x not in dupes] #want to keep the first occurance of the item present in dupes
with open(f, 'w') as json_data:
json.dump({'data': nations}, json_data)
答案 0 :(得分:2)
您可能无法使用酷列表理解,但常规循环应该可以使用
used_nations = {}
for nation in nations:
if nation['Capital'] in used_nations:
nations.remove(nation)
else:
used_nations.add(nation['Capital'])
答案 1 :(得分:1)
列表理解力很棒!但是......如果在此过程中涉及if
语句,它们会使代码复杂化。
这绝不是经验法则。相反,我鼓励你经常使用列表推导。在这种特殊情况下,更加分散的解决方案更具可读性。
我的建议是:
import json
seen = []
result = []
with open('data.json') as json_data:
nations = json.load(json_data)['data']
#takes care of duplicates and stores it in dupes
for item in nations:
if item['Capital'] not in seen:
seen.append(item['Capital'])
result.append(item)
with open('data.no_dup.json', 'w') as json_data:
json.dump({'data': result}, json_data)
经过测试并适用于Python 3.5.2。
请注意,为方便起见,我已移除了外环。
答案 2 :(得分:0)
以下是如何为您的给定json实现此目的的示例代码
import json
files = ['countries.json']
for f in files:
with open(f,'r') as fp:
nations = json.load(fp)
result = [dict(tupleized) for tupleized in set(tuple(item.items())\
for item in nations['data'])]
print result
print len(result)
输出:
[{u'Country': u'Russia', u'Capital': u'Moscow'}, {u'Country': u'Japan', u'Capital': u'Tokyo'}, {u'Country': u'Canada', u'Capital': u'Ottawa'}, {u'Country': u'India', u'Capital': u'New Delhi'}, {u'Country': u'Germany', u'Capital': u'Berlin'}, {u'Country': u'Australia', u'Capital': u'Canberra'}, {u'Country': u'China', u'Capital': u'Beijing.'}]
7