我试图通过首先创建一个python dict来创建一个JSON,最终生成以下结构化格式:
{"sentences": [{"sentence": "At the end of November 2005 , Hong Kong and America had 132 licensed banks , 41 restricted licensed banks , 35 deposit-taking institutions , and 86 representative offices .","parsedSentence": "xyz in text e.g. At the end of November 2005 , LOCATION_SLOT and LOCATION_SLOT had NUMBER_SLOT licensed banks , NUMBER_SLOT restricted licensed banks , NUMBER_SLOT deposit-taking institutions , and NUMBER_SLOT representative offices .","location-value-pairs": [{"America": 132}, {"America": 41}, {"America": 35},
{"Hong Kong": 132}, {"Hong Kong": 41}, {"Hong Kong": 35}]}]}
但是我似乎无法创建2个嵌套密钥的代码,然后是密钥的第三个密钥,每个密钥都有一个数组。
我目前的代码结构如下(请注意,我无法获得像"句子"," parsedSentence"等等要创建的密钥。注意我没有关键变量(我的键是字符串本身),我想移出它,以便将来我可以更快地遍历这个python字典:
for sentence in parsedSentences:
wordsInSentence = []
for token in sentence["tokens"]:
wordsInSentence.append(token["word"])
sentence = " ".join(wordsInSentence)
for locationTokenIDs, location in tokenIDs2location.items():
for numberTokenIDs, number in tokenIDs2number.items():
if sentence not in sentences2location2values:
sentences2location2values[sentence] = {}
if location not in sentences2location2values[sentence]:
sentences2location2values[sentence][location] = []
sentences2location2values[sentence][location].append(number)
with open(outputFile, "wb") as out:
json.dump(sentences2location2values, out)
这给了我一个看起来像这样的JSON:
{"Mobutu Sese Seku seized power in 1965 via a coup , renaming the country Zaire , and reigning for the next 32 years as head of a ruthless and corrupt dictatorship .": {"Zaire": [32.0]}, "\u00c3 cents \u00c2 $ \u00c2 cents Movement for the Liberation of the Congo -LRB- MLC -RRB- : Under the direction of Bemba , and backed by Uganda , the MLC was formed in 1998 with 154 soldiers .": {"Congo": [154.0], "Uganda": [154.0]}, ...
这并没有让我了解我需要的结构。
我怎样才能有一个解决方案,它基本上允许我在循环的正确部分逐个填写正确的键和值,而不仅仅是单行解决方案?
答案 0 :(得分:1)
在问题开头的理想输出和代码实际执行的内容之间似乎有些不匹配,因为代码没有创建密钥sentence
,parsedSentence
和location-value-pairs
。
这可能只是意味着我误解了这个问题,但如果没有,你可以试试这样的事情:
output = {"sentences": []}
for sentence in parsedSentences:
sentenceDict = {"parsedSentence": sentence}
wordsInSentence = []
for token in sentence["tokens"]:
wordsInSentence.append(token["word"])
sentence = " ".join(wordsInSentence)
sentenceDict["sentence"] = sentence
sentenceDict["location-value-pairs"] = []
for locationTokenIDs, location in tokenIDs2location.items():
for numberTokenIDs, number in tokenIDs2number.items():
sentenceDict["location-value-pairs"].append({location: number})
output["sentences"].append(sentenceDict)