Question

我有以下问题：给出两个dicts，其中article-IDs为键，title + author为值。我想使用文章ID来比较这两个词：如果文章ID有不同的标题/作者，我想创建一个映射，由一个字符串组成，该字符串首先输出旧的文章ID和标题和作者和具有相应标题和作者的新ID。

示例：

old = {u'2014_en_1': u'Letter A\tauthor A\n', u'2014_en_2': u'Explanation\tauthor B\n', u'2014_en_3': u'Conclusion\tauthor C\n'}
new = {u'2014_en_1': u'Welcome\tauthor XY\n', u'2014_en_2': u'Letter A\tauthor A\n', u'2014_en_3': u'Conclusion\tauthor C\n', u'2014_en_4': u'Explanation\tauthor B\n',}

for k, v in old.iteritems():
    if old[k] != new[k]:
        print k + "\t" + old[k] + # HOW can I find the corresponding article in new?

因此，所需的输出应为：

[]    []    2014_en_1    Welcome\tauthor XY
2014_en_1    Letter A\tauthor A    2014_en_2    Letter A\tauthor A
2014_en_2    Explanation\tauthor B    2014_en_4    Explanation\tauthor B
2014_en_3    Conclusion\tauthor C    2014_en_3    Conclusion\tauthor C

我该怎么做？这很棘手，因为新的dict可能会有新的文章（反之亦然）：/ 谢谢你的帮助！

Answer 1

# Get all keys   
keys = set(old.keys()).union(set(new.keys()))

# Reverse the new dict
new_reverse =  {v:k for k,v in new.items()}

# Loop keys and output
for k in keys:
    if k in old:
        if old[k] != new[k]:
            v = old[k]
            k_in_new = new_reverse[v]
            v_in_new = new[k_in_new]
        else:
            k_in_new = k
            v_in_new = v

        print '%s %s %s %s' % (k, old[k], k_in_new, v_in_new)
    else:
        print '[] [] %s %s' % (k, new[k])

Answer 2

如果您反转old映射，以便值（标题，作者）成为键，则会更容易。

然后你可以迭代new并尝试匹配ID：

old_reverse = {v: k for k, v in old.items()}
for k, v in new.iteritems():
    try:
        old_k = old_reverse[v]
        print "%s\t%s\t%s\t%s" % (old_k, repr(v), k, repr(v),)
    except KeyError:
        print "[]\t[]\t%s\t%s" % (k, repr(v),)

请注意，我使用repr使输出更具可读性。相反，您可能希望应用自己的一些字符串操作来获得所需的输出格式。

字典是Python中未分类的集合。如果要对它们进行排序，可以使用额外的步骤将输出存储在元组列表中，然后将其打印排序：

# Flip the dict
old_reverse = {v: k for k, v in old.items()}

# Map new VS old
data = []
for k, v in new.iteritems():
    try:
        old_k = old_reverse[v]
        data.append((old_k, v, k, v,))
    except KeyError:
        data.append((None, None, k, v,))

# Print them sorted
for old_k, old_v, k, v in sorted(data, key=lambda d: d[0]):
    print "%s\t%s\t%s\t%s" % (
        old_k if old_k is not None else "[]",
        repr(old_v) if old_k is not None else "[]",
        k, 
        repr(v),
    )

Python：找到两个dicts的差异

2 个答案: