在Python中创建三倍的嵌套JSON

时间:2016-06-11 18:43:06

标签: python json

我试图通过首先创建一个python dict来创建一个JSON,最终生成以下结构化格式:

{"sentences": [{"sentence": "At the end of November 2005 , Hong Kong and America had 132 licensed banks , 41 restricted licensed banks , 35 deposit-taking institutions , and 86 representative offices .","parsedSentence": "xyz in text e.g. At the end of November 2005 , LOCATION_SLOT and LOCATION_SLOT had NUMBER_SLOT licensed banks , NUMBER_SLOT restricted licensed banks , NUMBER_SLOT deposit-taking institutions , and NUMBER_SLOT representative offices .","location-value-pairs": [{"America": 132}, {"America": 41}, {"America": 35},
           {"Hong Kong": 132}, {"Hong Kong": 41}, {"Hong Kong": 35}]}]}

但是我似乎无法创建2个嵌套密钥的代码,然后是密钥的第三个密钥,每个密钥都有一个数组。

我目前的代码结构如下(请注意,我无法获得像"句子"," parsedSentence"等等要创建的密钥。注意我没有关键变量(我的键是字符串本身),我想移出它,以便将来我可以更快地遍历这个python字典:

for sentence in parsedSentences:
       wordsInSentence = []
       for token in sentence["tokens"]:
            wordsInSentence.append(token["word"])
       sentence = " ".join(wordsInSentence)      
       for locationTokenIDs, location in tokenIDs2location.items():
             for numberTokenIDs, number in tokenIDs2number.items():
                   if sentence not in sentences2location2values:
                        sentences2location2values[sentence] = {}
                   if location not in sentences2location2values[sentence]:
                        sentences2location2values[sentence][location] = []
                   sentences2location2values[sentence][location].append(number)

with open(outputFile, "wb") as out:
        json.dump(sentences2location2values, out)

这给了我一个看起来像这样的JSON:

{"Mobutu Sese Seku seized power in 1965 via a coup , renaming the country Zaire , and reigning for the next 32 years as head of a ruthless and corrupt dictatorship .": {"Zaire": [32.0]}, "\u00c3 cents \u00c2 $ \u00c2 cents Movement for the Liberation of the Congo -LRB- MLC -RRB- : Under the direction of Bemba , and backed by Uganda , the MLC was formed in 1998 with 154 soldiers .": {"Congo": [154.0], "Uganda": [154.0]}, ...

这并没有让我了解我需要的结构。

我怎样才能有一个解决方案,它基本上允许我在循环的正确部分逐个填写正确的键和值,而不仅仅是单行解决方案?

1 个答案:

答案 0 :(得分:1)

在问题开头的理想输出和代码实际执行的内容之间似乎有些不匹配,因为代码没有创建密钥sentenceparsedSentencelocation-value-pairs

这可能只是意味着我误解了这个问题,但如果没有,你可以试试这样的事情:

output = {"sentences": []}

for sentence in parsedSentences:

    sentenceDict = {"parsedSentence": sentence}

    wordsInSentence = []
    for token in sentence["tokens"]:
         wordsInSentence.append(token["word"])
    sentence = " ".join(wordsInSentence)

    sentenceDict["sentence"] = sentence

    sentenceDict["location-value-pairs"] = []

    for locationTokenIDs, location in tokenIDs2location.items():
        for numberTokenIDs, number in tokenIDs2number.items():
            sentenceDict["location-value-pairs"].append({location: number})

    output["sentences"].append(sentenceDict)