python中的BeautifulSoup或urllib.request在不同的机器上返回不同的内容

时间:2014-12-15 23:55:56

标签: python beautifulsoup urllib

所以我编写了这个简单的脚本,但它只适用于我的linux机器而不是Windows 8.1

代码是::

BASE_URL = "http://www.betfair.com/exchange/football/event?id="+ str(matchId)
html = urlopen(BASE_URL).read()
soup = BeautifulSoup(html)
homeScore = soup.find_all("span", {"class": "home-score"})[0].text

在我的Windows 8计算机上,它从urlopen返回:

html    bytes: b'\\n\\n    \\n    <!DOCTYPE html>\\n\\n    <!--[if IE]><![endif]-->\\n\\n    <!--[if       IE 9]>\\n    <html class="ie9" lang="da-DK"><![endif]--><!--[if IE 8]>\\n    <html class="ie8"   lang="da-DK"><![endif]--><!--[if IE 7]>\\n    <html class="ie7" lang="da-DK"><![endif]--><!--[if lt IE 7]>\\n    <html class="ie6" lang="da-DK"><![endif]-->\\n    <!--[if (gt IE 9)|!(IE)]><!-->\\n    <html lang="da-DK">\\n    <!--<![endif]-->\\n    <head>\\n        <meta name="description" content="San Luis de Quillota v Deportes Temuco ma 15 dec 2014 11:00PM - betting odds. Find markedets bedste spil, samt links til andre ressourcer.">\\n    <meta charset="utf-8">\\n    <meta name="viewport" content="width=device-width,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no"/>\\n    <base href="http://www.betfair.com/exchange/"/>\\n    <title>        San Luis de Quillota v Deportes Temuco betting odds | Chilean Primera B | betfair.com\\n</title>\\n\\n    <link rel="shortcut icon" href="//sn4.cdnbf.net/exchange/favicon_13031_....    

点是输出的实际结尾。如何在两个系统上使用相同的代码?

编辑:我的Windows 8是python 3.4,linux是python 3.2

1 个答案:

答案 0 :(得分:0)

如果您有机会,则应考虑使用Requests代替urllib

import requests
from bs4 import BeautifulSoup

base_url = 'http://www.betfair.com/exchange/football/event'
params = { 'id': str(matchId) }
r = requests.get(base_url, params=params)
html = r.content.decode('utf-8', 'ignore')
soup = BeautifulSoup(html, "lxml")

如果构建requests以无缝处理多种格式,应该在每个平台上运行。如果情况并非如此,请为r.content.decode()测试不同的参数,但无论如何,这比使用urllib要容易得多。