WEB SCRAPING-如何访问通过javascript ng-binding呈现的内容?

时间:2019-09-02 18:36:25

标签: python angularjs web-scraping beautifulsoup

我尝试从

抓取所有玩家统计信息

https://www.easports.com/madden-nfl/player-ratings?i=1&s=ovr_rating:DESC&v=true&=undefined

使用美丽汤: blogscraper.py

import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('https://www.easports.com/madden-nfl/player-ratings/?i=1&s=ovr_rating:DESC&v=true&=undefined')
soup = BeautifulSoup(response.text, 'html.parser')

posts = soup.find_all(class_='player_rating-value')
print(posts)

但是它给我一个空白列表,或者如果我尝试使用

posts = soup.find_all(class_='ratings-hub_database')
print(posts)

它只会给我统计直到力量为止

我阅读了其他解决方案,说我需要从“网络”标签中获取XHR数据,但我不知道如何

1 个答案:

答案 0 :(得分:0)

打开网络选项卡,然后按F5刷新页面。然后按Ctrl + F打开搜索框并输入播放器名称,然后按Enter。查看返回的结果,您会发现以下内容将统计信息返回为json

from bs4 import BeautifulSoup as bs
import requests

data = requests.get('https://www.easports.com/madden-nfl/ratings/service/data?entityType=madden19_player&filter=iteration:1&sort=ovr_rating:DESC,%20lastName:ASC&limit=25').json()
print(data)

您可以写入csv(请注意,您需要添加自己的标头顺序)

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

data = requests.get('https://www.easports.com/madden-nfl/ratings/service/data?entityType=madden19_player&filter=iteration:1&sort=ovr_rating:DESC,%20lastName:ASC&limit=25').json()
df = pd.DataFrame(data['docs'])
df.to_csv(r'C:\Users\User\Desktop\Info.csv', sep=',', encoding='utf-8-sig',index = False )

使用csv

from bs4 import BeautifulSoup as bs
import requests, csv

data = requests.get('https://www.easports.com/madden-nfl/ratings/service/data?entityType=madden19_player&filter=iteration:1&sort=ovr_rating:DESC,%20lastName:ASC&limit=25').json()
with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
    w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
    w.writerow(list(data['docs'][0].keys()))
    for row in data['docs']:
        w.writerow(list(row.values()))