Question

最小的工作示例：

import json, urllib

front_url = "http://chroniclingamerica.loc.gov/search/titles/results/?city=&rows="
number_rows = "1"
middle_url = "&terms=&language=&lccn=&material_type=&year1=1690&year2=2016&labor=&county=&state=&frequency=&ethnicity=&page="
page = "1"
end_url = "&sort=relevance&format=json"

url = front_url + number_rows + middle_url + page + end_url

response = urllib.urlopen(url)
data = json.loads(response.read())

问题是data对象识别顶级JSON（totalItems，endIndex，startIndex，itemsPerPage和{{1}但是，items对象也应该具有应该被识别的子级别（items，essay，county，title_normal等。如果您执行lccn，代码只会为items对象吐出一个混乱的字符串。

我希望能够将data['items']图层中包含的每种不同的信息价格最终提取到数组或类似的内容。我怎么能这样做？

Answer 1

您的代码运行正常。

您只是不明白data['items']是list。

因此，要访问此列表的每个元素，您必须使用从0到len(data['items'])的索引。

建议：使用pprint在json文件中查看clear。

import json, urllib
import pprint
pp = pprint.PrettyPrinter(indent=1, width=80)

front_url = "http://chroniclingamerica.loc.gov/search/titles/results/?city=&rows="
number_rows = "1"
middle_url = "&terms=&language=&lccn=&material_type=&year1=1690&year2=2016&labor=&county=&state=&frequency=&ethnicity=&page="
page = "1"
end_url = "&sort=relevance&format=json"

url = front_url + number_rows + middle_url + page + end_url

response = urllib.urlopen(url)
data = json.loads(response.read())

pp.pprint(data['items'][0]) # [0] to get the first item
print  data['items'][0]['essay'] # get the essay element of the first item
print  data['items'][0]['country'] # get the country element of the first item

Answer 2

在您的示例JSON数据中（您应该直接链接），您可以清楚地看到items是一个对象列表。在这种情况下，它只是一个带有键essay的对象。此键的值是一个字符串列表（在这种情况下只是一个字符串）。

这个字符串不是JSON。它是XHTML。它当然不会被json.loads解析。

我相信这个字符串就是你所说的'凌乱的字符串'。 items中的其他数据被json.loads解析得很好。

Answer 3

你想做这样的事吗？

for item in data['items']:
    print item['county']
    print item['title_normal']
    print item['lccn']

由于只有一个项目，因此输出以下内容。

[u'Bates']
butler weekly times and the bates county record.
sn86063289

Answer 4

你的代码非常好。你可以遍历所有项目。

import json
import urllib

URL_PATTERN = "http://chroniclingamerica.loc.gov/search/titles/results/" \
    "?rows={rows}" \
    "&year1={year1}" \
    "&year2={year2}" \
    "&page={page}" \
    "&sort={sort}" \
    "&format={format}"

rows = "1"
page = "1"
year1 = "1690"
year2 = "2016"
sort_kind = "relevance" 
response_kind = "json"

url = URL_PATTERN.format(rows=rows, page=page, year1=year1, year2=year2,
                         sort=sort_kind, format=response_kind)

response = urllib.urlopen(url)
data = json.loads(response.read())

for item in data.get("items", []):
    # Pretty print.
    print(json.dumps(item, indent=4))

另外，请记住，如果您没有使用某些过滤器选项，则可以简化上述网址。

The Zen of Python说：

美丽胜过丑陋。

和

可读性计数。

从python中的字符串获取JSON对象

4 个答案: