只是Q&来自Jeopardy JSON的A

时间:2017-11-01 21:26:05

标签: python parsing

我下载了这个200k Q / A的Jeopardy问题。我认为插入一些琐事机器人会很有趣。无论如何,它的大小只有50M,没有我可以看到的换行符。

我只是想把这个怪物的所有问题和答案都拉成文件格式,如:

"question":

这是文件的部分内容。我知道我不能一行一行,我知道我无法将整个内容加载到内存中。但是,我也知道我想要的是"answer":后引号中的第一个内容,答案是[{"category": "HISTORY", "air_date": "2004-12-31", "question": "'For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory'", "value": "$200", "answer": "Copernicus", "round": "Jeopardy!", "show_number": "4680"}, {"category": "ESPN's TOP 10 ALL-TIME ATHLETES", "air_date": "2004-12-31", "question": "'No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves'", "value": "$200", "answer": "Jim Thorpe", "round": "Jeopardy!", "show_number": "4680"}, {"category": "EVERYBODY TALKS ABOUT IT...", "air_date": "2004-12-31", "question": "'The city of Yuma in this state has a record average of 4,055 hours of sunshine each year'", "value": "$200", "answer": "Arizona", "round": "Jeopardy!", "show_number": "4680"}, ... 后直接引用的第一句话。

{
  "id"=>”0000001”, 
  "type"=>”cashier”, 
  "summary"=>”Henock”, 
  "self"=>"https://google.com/accounts/0000001”, 
  "html_url"=>"https://google.com/accounts/0000001”
}

{
  "id"=>”0000002”, 
  "type"=>”cashier”, 
  "summary"=>”Vic”, 
  "self"=>"https://google.com/accounts/0000002”, 
  "html_url"=>"https://google.com/accounts/0000002”
}

{
  "id"=>”0000003”, 
  "type"=>”cashier”, 
  "summary"=>”Mo”, 
  "self"=>"https://google.com/accounts/0000003”, 
  "html_url"=>"https://google.com/accounts/0000003”
}

1 个答案:

答案 0 :(得分:0)

对于列表中的每个字典,请获取'question''answer'键:

for l in d:
    print l['question'], l['answer']

输出:

'For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory' Copernicus
'No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves' Jim Thorpe
'The city of Yuma in this state has a record average of 4,055 hours of sunshine each year' Arizona