此代码将以TrickyBen身份登录,并向网站API发出请求...
导入请求 来自lxml import html 来自请求导入会话 将pandas导入为pd import shutil
raceSession = Session()
LoginDetails = {'login': 'TrickyBen', 'password': 'TrickyBen123'}
LoginUrl = 'https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15/horsebase1.php'
LoginPost = raceSession.post(LoginUrl, data=LoginDetails)
RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails = {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}
PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)
Table = pd.read_table(Response.text)
Table.to_csv('blahblah.csv')
如果你检查元素,你会发现相关元素看起来像这样......
<form action="excelresults.php" method="post">
<input type="hidden" name="user" value="41495">
<input type="hidden" name="racedate" value="2005-3-15">
<input type="submit" class="downloadbutton" value="Excel">
</form>
我收到此错误消息...
Traceback (most recent call last):
File "/Users/Alex/Desktop/DateTest/hrpull.py", line 20, in <module>
Table = pd.read_table(Response.text)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3427)
File "pandas/parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6861)
IOError: File race_date race_time track race_name race_restrictions_age race_class major race_distance prize_money going_description number_of_runners place distbt horse_name stall trainer horse_age jockey_name jockeys_claim pounds odds fav official_rating comptime TotalDstBt MedianOR Dist_Furlongs placing_numerical RCode BFSP BFSP_Place PlcsPaid BFPlcsPaid Yards RailMove RaceType
"2005-03-15" "14:00:00" "Cheltenham" "Letheby & Christopher Supreme Novices Hurdle " "4yo+" "Class 1" "Grade 1" "2m˝f " "58000" "Good" "20" "1st" "Arcalis" "0" "Johnson, J Howard" "5" "Lee, G" "0" "161" "21" "136" "3 mins 53.00s" "121.5" "16.5" "1" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
"2005-03-15" "14:00:00" "Cheltenham" "Letheby & Christopher Supreme Novices Hurdle " "4yo+" "Class 1" "Grade 1" "2m˝f " "58000" "Good" "20" "2nd" "6" "Wild Passion (GER)" "0" "Meade, Noel" "5" "Carberry, P" "0" "161" "11" "0" "3 mins 53.00s" "6" "121.5" "16.5" "2" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
答案 0 :(得分:0)
我认为您可以在另一个网页上看到要下载的数据,例如,点击&#34;我的系统(v4)&#34;。如果您可以这样做,那么您可以使用urllib.request.urlretrieve下载该页面。然后你可以使用html.parser.HTMLParser来解析数据并按照你的意愿去做。
答案 1 :(得分:0)
如果您要查看表单操作中调用的api,您会看到您要对此网址发布请求:
https://www.horseracebase.com/excelresults.php
具有以下参数:
data = {
"user": "41495", # looks like this varies with login, so update in case you change your login id
"racedate": "2005-3-15",
"downloadbutton": "Excel"
}
你可以这样做:
response = raceSession.post(reqUrl, json=data)
如果这不起作用,请尝试在请求中添加标头,例如:headers=postHeaders
。对于前者在这种情况下,您应该设置内容类型标题,因为您要发送表单编码数据,所以:
headers = {"Content-Type": "application/x-www-form-urlencoded"}
阅读this以获取有关如何将Excel保存到文件的更多信息
以下是邮递员对此请求的回复,因此除了content-type
之外,您似乎不需要任何其他标题:
修改强>
这是你需要做的:
raceSession = Session()
RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails = {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}
PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)
# from StringIO import StringIO #for python 2.x
#import StringIO #for python 3.x
Table = pd.read_table(StringIO(Response.text))