Question

我正在尝试从表中删除数据 - 即（http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30）。我在使用正确的命令时遇到困难。尝试了各种参数，没有工作。理想情况下，以格式返回数据，例：亚特兰大老鹰队，32岁，48.8岁，18岁，14岁，.563岁等我可以得到格式化的数据没有问题，只是得到所需的数据是什么导致我悲伤。

    import urllib2
    from bs4 import BeautifulSoup

    page = 'http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30'
    page = urllib2.urlopen(page)
    soup = BeautifulSoup(page)
    for dS in soup.find_all(???):
        print(dS.get(???))

Answer 1

使用像firefox firebug这样的工具来追踪你需要的html调用，查看你在firebug'net'选项卡中共享的链接，显示你所追踪的数据是在http://www.nba.com/cmsinclude/desktopWrapperHeader_jsonp.html的后续请求调用中实际上包含json数据，不确定BeautifulSoup在这里会很方便，尝试使用python json加载它

Answer 2

感谢您的建议，工作得相当不错。我最终使用的是像

    import json
    from pprint import pprint

    with open('NBA_DATA.json') as data_file:
    data = json.load(data_file)

    #Have this here for debug purpose just to see output
    pprint(data["resultSets"])

    for hed in data["resultSets"]:
        s1 = hed["headers"]
        s2 = hed["rowSet"]
        #more debugging
        #pprint(hed["headers"])
        #pprint(hed["rowSet"])
        list_of_s1 = list(hed["headers"])
        list_of_s2 = list(hed["rowSet"])

使用Python和Beautiful soup从体育桌上刮取数据

2 个答案: