我有一个充满属性的JSON对象,其中一些属性是随机重复的。我想基于"word" index
删除那些重复的那些,并且只保留第一次出现,如示例所示:
{ "word" : "Apple", "meaning" : "First meaning" },
{ "word" : "Ball", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Cat", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Apple", "meaning" : "Repeated, but has another meaning" },
{ "word" : "Doll", "meaning" : " \u090f\u0909\u091f\u093e" },
我是一名Python初学者,到目前为止我无法提前解决这个问题:
#!/usr/bin
import json
source="/var/www/dictionary/repeated.json"
destination="/var/www/dictionary/corrected.json"
def remove_redundant():
with open(source, "r") as src:
src_object = json.load(src)
for i in xrange(len(src_object)):
escape = 1
for j in xrange(len(src_object)):
if src_object[j]["word"] == src_object[i]["word"]:
# leave the first occurance
if escape == 1:
escape = 2
continue
else:
src_object.pop(j)
# open(destination, "w+").write(json.dumps(src_object, sort_keys=True, indent=4, separators=(',', ': ')))
src.close()
remove_redundant()
我不断得到的错误是IndexError: list index out of range
因为len正在不断变化。谢谢你的帮助。
答案 0 :(得分:1)
此处的参考是使用pop()
a = [{ "word" : "Apple", "meaning" : "First meaning" },
{ "word" : "Ball", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Cat", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Apple", "meaning" : "Repeated, but has another meaning" },
{ "word" : "Doll", "meaning" : " \u090f\u0909\u091f\u093e" },]
b = list()
keys = set()
while a:
x = a.pop(0)
if x['word'] not in keys:
keys.add(x['word'])
b.append(x)
a = b
del b
del keys
a
现在包含:
[{'meaning': 'First meaning', 'word': 'Apple'},
{'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Ball'},
{'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Cat'},
{'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Doll'}]
答案 1 :(得分:1)
你可以简单地做
from collections import OrderedDict
d = OrderedDict()
for item in data:
if item["word"] not in d:
d[item["word"]] = item
print d.values()
<强>输出强>
[{'meaning': 'First meaning', 'word': 'Apple'},
{'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Ball'},
{'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Cat'},
{'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Doll'}]