递归生成器函数Python嵌套JSON数据

时间:2016-10-07 10:18:49

标签: python json function yield

我正在尝试编写一个递归生成器函数来展平混合类型,列表和字典的嵌套json对象。我这样做部分是为了我自己的学习,所以避免从互联网上抓住一个例子以确保我更好地理解正在发生的事情,但是已经卡住了,我认为在函数中与yield语句相关的yield语句的正确位置循环。

传递给生成器函数的数据源是外部循环的输出,它循环遍历mongo集合。

当我在与Yield语句相同的位置使用print语句时,我得到了我期望的结果,但是当我将它切换到yield语句时,生成器似乎每次迭代只产生一个项目。

希望有人能告诉我出错的地方。

columns = ['_id'
    , 'name'
    , 'personId'
    , 'status'
    , 'explorerProgress'
    , 'isSelectedForReview'
           ]
db = MongoClient().abcDatabase

coll = db.abcCollection


def dic_recurse(data, fields, counter, source_field):
    counter += 1
    if isinstance(data, dict):
        for k, v in data.items():
            if k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                # print "{0}{1}".format(source_field, k)[1:], v
                yield "{0}{1}".format(source_field, k)[1:], v
            elif isinstance(v, list):
                source_field += "_{0}".format(k)
                [dic_recurse(l, fields, counter, source_field) for l in data.get(k)]
            elif isinstance(v, dict):
                source_field += "_{0}".format(k)
                dic_recurse(v, fields, counter, source_field)
    elif isinstance(data, list):
        [dic_recurse(l, fields, counter, '') for l in data]


for item in coll.find():
    for d in dic_recurse(item, columns, 0, ''):
        print d

以下是它正在迭代的数据样本,但嵌套的确超出了显示的范围。

{ 
    "_id" : ObjectId("5478464ee4b0a44213e36eb0"), 
    "consultationId" : "54784388e4b0a44213e36d5f", 
    "modules" : [
        {
            "_id" : "FF", 
            "name" : "Foundations", 
            "strategyHeaders" : [
                {
                    "_id" : "FF_Money", 
                    "description" : "Let's see where you're spending your money.", 
                    "name" : "Managing money day to day", 
                    "statuses" : [
                        {
                            "pid" : "54784388e4b0a44213e36d5d", 
                            "status" : "selected", 
                            "whenUpdated" : NumberLong(1425017616062)
                        }, 
                        {
                            "pid" : "54783da8e4b09cf5d82d4e11", 
                            "status" : "selected", 
                            "whenUpdated" : NumberLong(1425017616062)
                        }
                    ], 
                    "strategies" : [
                        {
                            "_id" : "FF_Money_CF", 
                            "description" : "This option helps you get a picture of how much you're spending", 
                            "name" : "Your spending and savings.", 
                            "relatedGoals" : [
                                {
                                    "_id" : ObjectId("54784581e4b0a44213e36e2f")
                                }, 
                                {
                                    "_id" : ObjectId("5478458ee4b0a44213e36e33")
                                }, 
                                {
                                    "_id" : ObjectId("547845a5e4b0a44213e36e37")
                                }, 
                                {
                                    "_id" : ObjectId("54784577e4b0a44213e36e2b")
                                }, 
                                {
                                    "_id" : ObjectId("5478456ee4b0a44213e36e27")
                                }
                            ], 
                            "soaTrashWarning" : "Understanding what you are spending and saving is crucial to helping you achieve your goals. Without this in place, you may be spending more than you can afford. ", 
                            "statuses" : [
                                {
                                    "personId" : "54784388e4b0a44213e36d5d", 
                                    "status" : "selected", 
                                    "whenUpdated" : NumberLong(1425017616062)
                                }, 
                                {
                                    "personId" : "54783da8e4b09cf5d82d4e11", 
                                    "status" : "selected", 
                                    "whenUpdated" : NumberLong(1425017616062)
                                }
                            ], 
                            "trashWarning" : "This option helps you get a picture of how much you're spending and how much you could save.\nAre you sure you don't want to take up this option now?\n\n", 
                            "weight" : NumberInt(1)
                        }, 

更新 我对生成器功能进行了一些更改,虽然我不确定它们是否真的改变了什么,但我一直在逐步调试打印版本和yield版本的调试器。新代码如下。

def dic_recurse(data, fields, counter, source_field):
    print 'Called'
    if isinstance(data, dict):
        for k, v in data.items():
            if isinstance(v, list):
                source_field += "_{0}".format(k)
                [dic_recurse(l, fields, counter, source_field) for l in v]
            elif isinstance(v, dict):
                source_field += "_{0}".format(k)
                dic_recurse(v, fields, counter, source_field)
            elif k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                counter += 1
                yield "L{0}_{1}_{2}".format(counter, source_field, k.replace('_', ''))[1:], v
    elif isinstance(data, list):
        for l in data:
            dic_recurse(l, fields, counter, '')

调试时两个版本之间的主要区别似乎是当这部分代码被命中时。

elif isinstance(data, list):
            for l in data:
                dic_recurse(l, fields, counter, '')

如果我正在测试yield版本,那么对dic_recurse(l, fields, counter, '')行的调用会被命中,但它似乎没有调用该函数,因为我在函数开头设置的任何print语句都没有被命中,但是如果我使用print做同样的事情,然后当代码点击相同的部分时,它会愉快地调用函数并运行整个函数。

我确信我可能误解了关于生成器和使用yield语句的基本原理。

1 个答案:

答案 0 :(得分:0)

代替对此的任何回复,我只想发布我的更新解决方案,以防它对任何其他人都有用。

我需要在函数中添加额外的yield语句,这样生成器函数的每次递归调用的结果都可以传递给下一个使用,至少我是如何理解它的。 。很高兴得到纠正。

def dic_recurse(data, fields, counter, source_field):
    if isinstance(data, dict):
        counter += 1
        for k, v in data.items():
            if isinstance(v, list):
                for field_data in v:
                    for list_field in dic_recurse(field_data, fields, counter, source_field):
                        yield list_field
            elif isinstance(v, dict):
                for dic_field in dic_recurse(v, fields, counter, source_field):
                    yield dic_field
            elif k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                yield counter, {"{0}_L{1}".format(k, counter): v}
    elif isinstance(data, list):
        counter += 1
        for list_item in data:
            for li2 in dic_recurse(list_item, fields, counter, ''):
                yield li2