从json字典打印多个网页项目

时间:2014-07-31 18:04:41

标签: python json scrapy

我在Windows Vista 64位上使用Python.org版本2.7 64位。我有以下代码从javascript' Datastore.prime'中提取数据。嵌入在代码中引用的页面的HTML中的项目:

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
from scrapy.cmdline import execute
import csv
import re
import json

filepath = "C:\\Python27\\Football Data\\test" + ".txt"

with open(filepath, "w") as f:
    f.write("")
    f.close()

class MySpider(Spider):

    name = "goal2"
    allowed_domains = ["whoscored.com"]
    start_urls = ["http://www.whoscored.com/Teams/705/Archive/Israel-Maccabi-Haifa"]      

    def parse(self, response):

        playerdata = re.search(re.escape("DataStore.prime('stage-player-stat', defaultTeamPlayerStatsConfigParams.defaultParams , ") + '(\[.*\])' + re.escape(");"), response.body).group(1)

        for player in json.loads(playerdata):
            print player['FirstName']

这很好用,并列出了页面主表中包含的所有玩家名字。但是,当我尝试从数据存储区打印多个字段时,例如' FirstName'和'姓氏'通过修改打印声明来打印[' FirstName',' LastName']以获取错误:

    print player['FirstName', 'LastName']
 exceptions.KeyError: ('FirstName', 'LastName')

有谁可以告诉我为什么这不起作用以及如何修改代码以从Datastore.prime返回多个数据字段?

由于

1 个答案:

答案 0 :(得分:1)

评估player['FirstName', 'LastName']时,Python会尝试将('FirstName', 'LastName')转换为元组并将其用作索引。但是从json.loads返回的字典并没有将元组作为索引。因此,您需要单独查找每个字段并将它们join放在一起。

# simulate loading playerdata
players = [
    {'FirstName': 'Podge', 'LastName': 'Hasglow'},
    {'FirstName': 'Milo', 'LastName': 'Holloway'},
    {'FirstName': 'Staisy', 'LastName': 'Beccasdaughter'},
]
# or in your case:
##players = json.loads(playerdata)
# now print them all
for player in players:
    player_fullname = ' '.join(player[colname] for colname in ['FirstName', 'LastName'])
    print(player_fullname)