使用python抓取表格以获取玩家列表

时间:2018-07-08 11:44:55

标签: python python-2.7 web-scraping beautifulsoup

我正在尝试为该网站的玩家抓取EA体育足球桌:

https://www.easports.com/fifa/ultimate-team/fut/database/results?position_secondary=LF,CF,RF,ST,LW,LM,CAM,CDM,CM,RM,RW,LWB,LB,CB,RB,RWB

我已经运行了这个简单的代码,但是我无法获得任何输出 代码:

import requests, bs4

r = requests.get('https://www.easports.com/fifa/ultimate-team/fut/database/results?position_secondary=LF,CF,RF,ST,LW,LM,CAM,CDM,CM,RM,RW,LWB,LB,CB,RB,RWB')
soup = bs4.BeautifulSoup(r.text, 'lxml')
contents = soup.find(class_='contrast-white')

有人可以帮我吗?

1 个答案:

答案 0 :(得分:0)

因此该页面的问题在于这些元素是由javascript动态生成的。

对我们来说幸运的是,大多数数据都是通过api调用获得的。因此,我们可以使用浏览器cookie绕过此限制,并向实际的api发出请求。

这是我想出的,希望它能满足您的需求:

import requests

def parse_item(item):
    attr_list = item['attributes']

    return {
        'name': item['name'],
        'type': item['playerType'],
        'OVR': item['composure'],
        'POS': item['position'],
        'PAC': get_attr_by_name(attr_list, 'PAC'),
        'DRI': get_attr_by_name(attr_list, 'DRI'),
        'SHO': get_attr_by_name(attr_list, 'SHO'),
        'DEF': get_attr_by_name(attr_list, 'DEF'),
        'PAS': get_attr_by_name(attr_list, 'PAS'),
        'PHY': get_attr_by_name(attr_list, 'PHY'),
    }

def get_attr_by_name(attr_list, attr_name):
    attr_name = attr_name.upper()

    try:
        return next(item['value'] for item in attr_list if item['name'].endswith(attr_name))
    except:
        return None


cookies = {
    'hl': 'us',
    'ak_bmsc': '2F856B67859A41FAFB7A62172F068FA7C99F9D14F555000037F4435B86E7E136~plcKkcciaz+3qtfstmojfDw6NLaOVQ0MD41+JJKpeGyyladBNwRB0lLcC8lVi+ELaolN0j0Yzs6HiXjknNAgxjejeFu1I32ZeiaXDNykNhtnNweIIWc26f6y1G6fcpEnkqc2shuFIGn0qSRkilVLfccdJ9pi6yVVjS09lvCSNsi8dNPeU8QUxup+jHmez3zlPebfRyk1zZ8bFb6DBiZ0Dyj6fJepQ89AJ6Kcaf5Ynd3FgefDstwDxcRbDKnssM14iLiSjwri5VWdNP4KtsmmP2as63Xxc5MaVBbTjyk2i5/o8Rj852VMkBWPlskrlkBkliBwOTM4rIFXxZhSSwO2+gog==',
    'bm_sv': '830B3A15206003312D12E0B6FB4A2696~GupjwX5n1ZUaBybPwNV8B+/mIEouVASaWGBxPDg0p/S9lbZ98ziLYDEUArV6w2sGEn7NdWMub6mV5tEsGLoEgI48TmNE1/TUwtEyJcmtg2SlGBlGzFi64B2XdCR6oL2xy92x6zdNb6kOL3U+8YaBhQxd5nutL7sFddcENkQOb3E=',
    'DOT_COM_PHPSESSID': 'e4r4ekoramipe1qvahf0fp2630',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0',
}

params = {
    'jsonParamObject': {
        'page': 1,
        'position': 'LF,CF,RF,ST,LW,LM,CAM,CDM,CM,RM,RW,LWB,LB,CB,RB,RWB'
    }
}

r = requests.get(
    'https://www.easports.com/fifa/ultimate-team/api/fut/item',
    params=params,
    cookies=cookies
)

items = r.json()['items']

data = [parse_item(item) for item in items]

json很大,所以我写了几个函数来帮助从中提取所需的数据。

data是字典的列表。这是单个元素的样子:

>>> data[0]
{'name': 'Cristiano Ronaldo', 'type': 'TEAM OF THE YEAR', 'OVR': 99, 'POS': 'LW', 'PAC': 98, 'DRI': 98, 'SHO': 99, 'DEF': 50, 'PAS': 94, 'PHY': 95}

您可能需要将cookies上的值更改为浏览器设置的值。