从JSON获得错误的结果 - Python 3

时间:2017-07-22 20:48:16

标签: python json python-3.x

我正在开发一个小项目,使用Python 3从Google Books API检索有关图书的信息。为此,我调用API,读出变量并将其存储在列表中。对于像" linkedin"这样的搜索这很完美。但是,当我输入" Google"时,它会从JSON输入中读取第二个标题。怎么会发生这种情况?

请在下面找到我的代码(Google_Results是我用来初始化变量的类):

import requests
def Book_Search(search_term):
    parms = {"q": search_term, "maxResults": 3}
    r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
    print(r.url)

    results = r.json()
    i = 0
    for result in results["items"]:
        try:
            isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
            isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
            title = str(result["volumeInfo"]["title"])
            author = str(result["volumeInfo"]["authors"])[2:-2]
            publisher = str(result["volumeInfo"]["publisher"])
            published_date = str(result["volumeInfo"]["publishedDate"])
            description = str(result["volumeInfo"]["description"])
            pages = str(result["volumeInfo"]["pageCount"])
            genre = str(result["volumeInfo"]["categories"])[2:-2]
            language = str(result["volumeInfo"]["language"])
            image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])

            dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
                           language, image_link)
            gr.append(dict)
            print(gr[i].title)
            i += 1
        except:
            pass
    return

gr = []
Book_Search("Linkedin")

我是Python的初学者,所以任何帮助都会受到赞赏!

2 个答案:

答案 0 :(得分:1)

这样做是因为第一个条目的publisher中没有volumeInfo条目,因此它会引发KeyError并且您的except会抓住它。如果您要使用模糊数据,则必须考虑到它并不总是具有预期结构的事实。对于简单的情况,如果缺少条目,您可以依赖dict.get()及其default参数返回“有效”默认条目。

此外,您的函数存在一些概念性问题 - 它依赖于全局gr这是一个糟糕的设计,它会影响内置dict类型并捕获所有异常,保证您即使使用SIGINT也无法退出代码......我建议你把它转换成更健全的东西:

def book_search(search_term, max_results=3):
    results = []  # a list to store the results
    parms = {"q": search_term, "maxResults": max_results}
    r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
    try:  # just in case the server doesn't return valid JSON
        for result in r.json().get("items", []):
            if "volumeInfo" not in result:  # invalid entry - missing volumeInfo
                continue
            result_dict = {}  # a dictionary to store our discovered fields
            result = result["volumeInfo"]  # all the data we're interested is in volumeInfo
            isbns = result.get("industryIdentifiers", None)  # capture ISBNs
            if isinstance(isbns, list) and isbns:
                for i, t in enumerate(("isbn10", "isbn13")):
                    if len(isbns) > i and isinstance(isbns[i], dict):
                        result_dict[t] = isbns[i].get("identifier", None)
            result_dict["title"] = result.get("title", None)
            authors = result.get("authors", None)  # capture authors
            if isinstance(authors, list) and len(authors) > 2:  # you're slicing from 2
                result_dict["author"] = str(authors[2:-2])
            result_dict["publisher"] = result.get("publisher", None)
            result_dict["published_date"] = result.get("publishedDate", None)
            result_dict["description"] = result.get("description", None)
            result_dict["pages"] = result.get("pageCount", None)
            genres = result.get("authors", None)  # capture genres
            if isinstance(genres, list) and len(genres) > 2:  # since you're slicing from 2
                result_dict["genre"] = str(genres[2:-2])
            result_dict["language"] = result.get("language", None)
            result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
            # make sure Google_Results accepts keyword arguments like title, author...
            # and make them optional as they might not be in the returned result
            gr = Google_Results(**result_dict)
            results.append(gr)  # add it to the results list
    except ValueError:
        return None  # invalid response returned, you may raise an error instead
    return results  # return the results

然后,您可以轻松地为一个术语检索尽可能多的信息:

gr = book_search("Google")

如果您的Google_Results类型使大多数条目成为可选项,那么它将更容忍数据遗漏。

答案 1 :(得分:0)

根据@ Coldspeed的建议,很明显JSON文件中缺少的信息导致异常运行。因为我只有一个"传递"那里的声明它跳过了整个结果。因此,我将不得不适应"尝试和排除"语句,以便正确处理错误。

感谢帮助人员!