在Python中将文本与嵌套的OrderedDict分开

时间:2014-04-10 14:18:06

标签: python ordereddictionary

这是我的OrderedDict对象,

a=OrderedDict([(u'p', [u'"The Exam Room" is a new series in which everyday medical questions are answered by physicians and professors from the Yale School of Medicine.', u'In our second episode: Dr. Stephen Strittmatter, Vincent Coates Professor of Neurology and director of the Adler Memory Clinic in Neurology, explains when memory loss can become a problem and what you can do to boost your brain power.', OrderedDict([(u'em', u'Produced & Hosted by Noah Golden')])])])

我要做的是从这个对象获取文本,

>>> a.get('p')

获得输出,

[u'"The Exam Room" is a new series in which everyday medical questions are answered by physicians and professors from the Yale School of Medicine.', u'In our second episode: Dr. Stephen Strittmatter, Vincent Coates Professor of Neurology and director of the Adler Memory Clinic in Neurology, explains when memory loss can become a problem and what you can do to boost your brain power.', OrderedDict([(u'em', u'Produced & Hosted by Noah Golden')])]

但结果文本也包含一个OrderedDict

如何合并OrderedDict

中的文字

预期产出:

The Exam Room" is a new series in which everyday medical questions are answered by physicians and professors from the Yale School of Medicine.', u'In our second episode: Dr. Stephen Strittmatter, Vincent Coates Professor of Neurology and director of the Adler Memory Clinic in Neurology, explains when memory loss can become a problem and what you can do to boost your brain power. Produced & Hosted by Noah Golden

2 个答案:

答案 0 :(得分:2)

如果你事先不知道类型的嵌套,那么这里的关键是递归。这是一个示例(格式化文本以便于阅读):

#!/usr/bin/env python

import collections

a = collections.OrderedDict([(u'p', [u""" 
    "The Exam Room" is a new series in
    which everyday medical questions are answered by physicians and 
    professors from the Yale School of Medicine.""", 
    u"""In our second episode: Dr. Stephen Strittmatter,
    Vincent Coates Professor of Neurology and director of
    the Adler Memory Clinic in Neurology, explains when 
    memory loss can become a problem and what you can do to 
    boost your brain power.""", 
    collections.OrderedDict([(u'em',
        u'Produced & Hosted by Noah Golden')])])])

现在展平对象,可能是映射或列表。实现了三个选项:如果找到的值是字符串,我们只需将其附加到collector。如果是listMapping,我们会再次致电flatten。请注意,您可以使用allowed kwarg指定一些允许的标记:

def flatten(obj, allowed=(u'p', u'em')):
    collector = []

    def process(v, collector=collector):
        if isinstance(v, (list, collections.Mapping)):
            collector += flatten(v, allowed=allowed)
        elif isinstance(v, basestring):
            collector.append(v)
        else:
            raise ValueError('Cannot handle type: {t}'.format(t=v.__class__))

    if isinstance(obj, list):
        for v in obj:
            process(v)

    if isinstance(obj, collections.Mapping):
        for k, v in obj.iteritems():
            if k in allowed:
                process(v)

    return collector

if __name__ == '__main__':
    print(flatten(a))

您的示例的结果将是一个三元素列表,如下所示:

[u'"The Exam Room" is a new series ...',
 u'In our second episode: ...',
 u'Produced & Hosted by Noah Golden']

现在,如果你想要一个字符串,只需join现在被展平的列表:

print(''.join(flatten(a)))

答案 1 :(得分:1)

这是一个奇怪的词典,但你可以实现想要你想要的这样:

[a['p'][0],a['p'][1] + u' ' + a['p'][2]['em']]

结果:

  

[u'“考试室”是一个新的系列,每天都有医学问题   由耶鲁大学的医生和教授回答   医学。',你在我们的第二集:Stephen Strittmatter博士,文森特   科茨神经病学教授和阿德勒记忆诊所主任   在神经病学中,解释当记忆丧失可能成为一个问题和什么   你可以做到提高你的脑力。制作&由Noah主持   金']

这会返回一个列表,正如您在问题中所要求的那样。如果你想要一个字符串:

import string
string.join([a['p'][0],a['p'][1],a['p'][2]['em']])

将导致:

  

“考试室”是一个新的系列,其中包含日常医疗问题   由耶鲁大学的医生和教授回答   医学。在我们的第二集中:Stephen Strittmatter博士,Vincent   科茨神经病学教授和阿德勒记忆诊所主任   在神经病学中,解释当记忆丧失可能成为一个问题和什么   你可以做到提高你的脑力。制作&由Noah主持   金