使用pop删除列表中的多个对象

时间:2014-04-29 07:07:01

标签: python

我有一个充满属性的JSON对象,其中一些属性是随机重复的。我想基于"word" index删除那些重复的那些,并且只保留第一次出现,如示例所示:

{ "word" : "Apple", "meaning" : "First meaning" },
{ "word" : "Ball", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Cat", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Apple", "meaning" : "Repeated, but has another meaning" },
{ "word" : "Doll", "meaning" : " \u090f\u0909\u091f\u093e" },

我是一名Python初学者,到目前为止我无法提前解决这个问题:

#!/usr/bin
import json

source="/var/www/dictionary/repeated.json"
destination="/var/www/dictionary/corrected.json"

def remove_redundant():

    with open(source, "r") as src:      
        src_object = json.load(src)

        for i in xrange(len(src_object)):

            escape = 1

            for j in xrange(len(src_object)):

                if src_object[j]["word"] == src_object[i]["word"]:

                    # leave the first occurance
                    if escape == 1:
                        escape = 2
                        continue
                    else:
                        src_object.pop(j)

    # open(destination, "w+").write(json.dumps(src_object, sort_keys=True, indent=4, separators=(',', ': ')))

    src.close()

remove_redundant()

我不断得到的错误是IndexError: list index out of range因为len正在不断变化。谢谢你的帮助。

2 个答案:

答案 0 :(得分:1)

此处的参考是使用pop()

的示例
a = [{ "word" : "Apple", "meaning" : "First meaning" },
     { "word" : "Ball", "meaning" : " \u090f\u0909\u091f\u093e" },
     { "word" : "Cat", "meaning" : " \u090f\u0909\u091f\u093e" },
     { "word" : "Apple", "meaning" : "Repeated, but has another meaning" },
     { "word" : "Doll", "meaning" : " \u090f\u0909\u091f\u093e" },]

b = list()
keys = set()

while a:
    x = a.pop(0)
    if x['word'] not in keys:
        keys.add(x['word'])
        b.append(x)
a = b
del b
del keys

a现在包含:

[{'meaning': 'First meaning', 'word': 'Apple'},
 {'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Ball'},
 {'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Cat'},
 {'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Doll'}]

答案 1 :(得分:1)

你可以简单地做

from collections import OrderedDict
d = OrderedDict()
for item in data:
    if item["word"] not in d:
        d[item["word"]] = item

print d.values()

<强>输出

[{'meaning': 'First meaning', 'word': 'Apple'},
 {'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Ball'},
 {'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Cat'},
 {'meaning': ' \\u090f\\u0909\\u091f\\u093e', 'word': 'Doll'}]