python

时间:2015-10-10 14:17:10

标签: python json dictionary nested

我有一个非常令人沮丧的问题。 我想从维基百科中获取所有类别和子类别,子子类别等,并将其放入一个巨大的嵌套字典中。

我的问题是,例如,如果我找到顶级类别(类别:全部),我可以使用找到的子类别再次迭代循环,但我不能让它们嵌套在我的词典中。

是否有人可以提供帮助或看到错误。?

提前致谢,

import requests  # http://docs.python-requests.org/en/latest/
import json
from bs4 import BeautifulSoup

category = 'Categorie:Alles'

def wiki_api_request(category):
    url = ('http://nl.wikipedia.org/w/api.php?format=json&action=query&list=categorymembers&cmtitle=%s&cmlimit=500')%category
    return url

category_dict = {}

def crawl(category_name, _dict):
    url = wiki_api_request(category_name)

    _url = requests.get(url)


    extract = _url.json()

    category_amount = 0
    if 'query' in extract:
        category_list_json = extract['query']['categorymembers']
        _dict[category_name] = {category['title'] for category in category_list_json}

        for category in category_list_json:
            if 'Categorie:' in category['title']:
                crawl(category['title'], _dict[category_name] ** <-This gives an error**)
                break

crawl(category, category_dict)
print category_dict

错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-b8027c8281eb> in <module>()
     29                 break
     30 
---> 31 crawl(category, category_dict)
     32 print category_dict

<ipython-input-40-b8027c8281eb> in crawl(category_name, _dict)
     26         for category in category_list_json:
     27             if 'Categorie:' in category['title']:
---> 28                 crawl(category['title'], _dict[category_name])
     29                 break
     30 

<ipython-input-40-b8027c8281eb> in crawl(category_name, _dict)
     22     if 'query' in extract:
     23         category_list_json = extract['query']['categorymembers']
---> 24         _dict[category_name] = {category['title'] for category in category_list_json}
     25 
     26         for category in category_list_json:

TypeError: 'set' object does not support item assignment

1 个答案:

答案 0 :(得分:2)

{category['title'] for category in category_list_json}是一种集合理解,而不是字典理解。因此,分配给_dict的结果将为set

你可能想要一个带有空字典的字典作为理解结果的值,所以

{category['title']:{} for category in category_list_json}

或更明确地

{category['title']:dict() for category in category_list_json}