我在Windows Vista 64位上使用Python.org版本2.7 64位。我有以下代码从javascript' Datastore.prime'中提取数据。嵌入在代码中引用的页面的HTML中的项目:
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
from scrapy.cmdline import execute
import csv
import re
import json
filepath = "C:\\Python27\\Football Data\\test" + ".txt"
with open(filepath, "w") as f:
f.write("")
f.close()
class MySpider(Spider):
name = "goal2"
allowed_domains = ["whoscored.com"]
start_urls = ["http://www.whoscored.com/Teams/705/Archive/Israel-Maccabi-Haifa"]
def parse(self, response):
playerdata = re.search(re.escape("DataStore.prime('stage-player-stat', defaultTeamPlayerStatsConfigParams.defaultParams , ") + '(\[.*\])' + re.escape(");"), response.body).group(1)
for player in json.loads(playerdata):
print player['FirstName']
这很好用,并列出了页面主表中包含的所有玩家名字。但是,当我尝试从数据存储区打印多个字段时,例如' FirstName'和'姓氏'通过修改打印声明来打印[' FirstName',' LastName']以获取错误:
print player['FirstName', 'LastName']
exceptions.KeyError: ('FirstName', 'LastName')
有谁可以告诉我为什么这不起作用以及如何修改代码以从Datastore.prime返回多个数据字段?
由于
答案 0 :(得分:1)
评估player['FirstName', 'LastName']
时,Python会尝试将('FirstName', 'LastName')
转换为元组并将其用作索引。但是从json.loads
返回的字典并没有将元组作为索引。因此,您需要单独查找每个字段并将它们join
放在一起。
# simulate loading playerdata
players = [
{'FirstName': 'Podge', 'LastName': 'Hasglow'},
{'FirstName': 'Milo', 'LastName': 'Holloway'},
{'FirstName': 'Staisy', 'LastName': 'Beccasdaughter'},
]
# or in your case:
##players = json.loads(playerdata)
# now print them all
for player in players:
player_fullname = ' '.join(player[colname] for colname in ['FirstName', 'LastName'])
print(player_fullname)