Python:找到两个dicts的差异

时间:2014-10-10 15:19:58

标签: python dictionary mapping

我有以下问题:给出两个dicts,其中article-IDs为键,title + author为值。我想使用文章ID来比较这两个词:如果文章ID有不同的标题/作者,我想创建一个映射,由一个字符串组成,该字符串首先输出旧的文章ID和标题和作者和具有相应标题和作者的新ID。

示例:

old = {u'2014_en_1': u'Letter A\tauthor A\n', u'2014_en_2': u'Explanation\tauthor B\n', u'2014_en_3': u'Conclusion\tauthor C\n'}
new = {u'2014_en_1': u'Welcome\tauthor XY\n', u'2014_en_2': u'Letter A\tauthor A\n', u'2014_en_3': u'Conclusion\tauthor C\n', u'2014_en_4': u'Explanation\tauthor B\n',}

for k, v in old.iteritems():
    if old[k] != new[k]:
        print k + "\t" + old[k] + # HOW can I find the corresponding article in new?

因此,所需的输出应为:

[]    []    2014_en_1    Welcome\tauthor XY
2014_en_1    Letter A\tauthor A    2014_en_2    Letter A\tauthor A
2014_en_2    Explanation\tauthor B    2014_en_4    Explanation\tauthor B
2014_en_3    Conclusion\tauthor C    2014_en_3    Conclusion\tauthor C

我该怎么做?这很棘手,因为新的dict可能会有新的文章(反之亦然):/ 谢谢你的帮助!

2 个答案:

答案 0 :(得分:0)

# Get all keys   
keys = set(old.keys()).union(set(new.keys()))

# Reverse the new dict
new_reverse =  {v:k for k,v in new.items()}

# Loop keys and output
for k in keys:
    if k in old:
        if old[k] != new[k]:
            v = old[k]
            k_in_new = new_reverse[v]
            v_in_new = new[k_in_new]
        else:
            k_in_new = k
            v_in_new = v

        print '%s %s %s %s' % (k, old[k], k_in_new, v_in_new)
    else:
        print '[] [] %s %s' % (k, new[k])

答案 1 :(得分:0)

如果您反转old映射,以便值(标题,作者)成为键,则会更容易。

然后你可以迭代new并尝试匹配ID:

old_reverse = {v: k for k, v in old.items()}
for k, v in new.iteritems():
    try:
        old_k = old_reverse[v]
        print "%s\t%s\t%s\t%s" % (old_k, repr(v), k, repr(v),)
    except KeyError:
        print "[]\t[]\t%s\t%s" % (k, repr(v),)

请注意,我使用repr使输出更具可读性。相反,您可能希望应用自己的一些字符串操作来获得所需的输出格式。

字典是Python中未分类的集合。如果要对它们进行排序,可以使用额外的步骤将输出存储在元组列表中,然后将其打印排序:

# Flip the dict
old_reverse = {v: k for k, v in old.items()}

# Map new VS old
data = []
for k, v in new.iteritems():
    try:
        old_k = old_reverse[v]
        data.append((old_k, v, k, v,))
    except KeyError:
        data.append((None, None, k, v,))

# Print them sorted
for old_k, old_v, k, v in sorted(data, key=lambda d: d[0]):
    print "%s\t%s\t%s\t%s" % (
        old_k if old_k is not None else "[]",
        repr(old_v) if old_k is not None else "[]",
        k, 
        repr(v),
    )