Question

我遇到了一个独特的情况，正在寻找解决方案。有一张我想用熊猫刮的桌子，但问题是它需要我登录网站。因为我有按月订阅，所以我可以查看整个表，否则它将返回前20行，其中一些是NaN。大熊猫的新手，喜欢表格的脚本是如此简单。这是登录视图。所有可用数据。我只使用硒登录，这就是这个想法的来历。

Here is the link

import bs4 as bs
import pandas as pd
import urllib


dfs = pd.read_html('https://www.baseball-reference.com/play-index/game_finder.cgi?request=1&'
                   'match=basic&series=any&series_game=any&min_year_game=2018&max_year_game=2018'
                   '&WL=any&team_id=ANY&opp_id=ANY&game_length=any&bats=any&throws=any&pos_1=1&pos_2=1'
                   '&pos_3=1&pos_4=1&pos_5=1&pos_6=1&pos_7=1&pos_8=1&pos_9=1&pos_10=1&pos_11=1&pos_12=1'
                   '&exactness=any&HV=any&GS=anyGS&GF=anyGF&is_birthday=either&temperature_min=0&temperature_max'
                   '=120&wind_speed_min=0&wind_speed_max=90&as=result_batter&class=player&offset=0&type=b&c1gtlt'
                   '=gt&c2gtlt=gt&c3gtlt=gt&c4gtlt=gt&c5gtlt=gt&c5val=1.0&location=pob&locationMatch=is&orderby=HR&number_matched=1')

    for df in dfs:
        pprint(dfs)


        df.to_csv('ALL_Ref_AtBats.csv', mode='w')

Answer 1

您可以使用selenium进行网络自动化，因为您需要自动进行日志记录。

另一种解决方案是在对URL发出GET方法的同时使用请求并发送cookie。

登录网站并使用熊猫

1 个答案: