使用BeautifulSoup

时间:2015-09-25 02:39:58

标签: python beautifulsoup

我在python 3.4中使用BeautifulSoup如下

soup = BeautifulSoup(urlopen(URL), 'html.parser')

for fraction in soup.findAll("div", { "class" : "eventprice" }):
    print(fraction.get_text())

我想要提取的数据如下:

<div id="ip_selection983317834" class="eventprice">


                    1/2


        </div>

我用fraction.get_div探索了多个选项,改变了属性,没有改变属性。这里发生了什么?

1 个答案:

答案 0 :(得分:2)

只需切换到requests即可让我感到满意:

from bs4 import BeautifulSoup
import requests

URL = "http://sports.williamhill.com/bet/en-gb/betting/y/5/tm/0/Football.html"
response = requests.get(URL)

soup = BeautifulSoup(response.content, 'html.parser')

for fraction in soup.findAll("div", { "class" : "eventprice" }):
    print(fraction.get_text(strip=True))

打印:

1/2
16/5
11/2
8/5
...
5/6
21/10
7/2

我猜测这是因为requests发送的默认标头。就我而言,他们是:

{'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.3.0 CPython/2.7.6 Darwin/14.1.0'}