使用for循环功能进行数据抓取

时间:2020-03-07 18:45:27

标签: python html

我正在尝试学习数据抓取,并且通过从oddschecker站点抓取马匹到此为止。我正在使用Anaconda和Spyder。

我目前正处在下面的代码让我充分利用我所需的所有信息的地方:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

#create fetchsoup fuction using mozilla
def fetchSoup(url, userAgent='Mozilla/5.0' ):
    req = Request(url, headers={'User-Agent': userAgent})
    with urlopen(req) as response:
        html = response.read()
    return BeautifulSoup(html, "lxml")

#define my url
url = 'https://www.oddschecker.com/horse-racing/chelmsford-city/20:30/top-3-finish'
soup = fetchSoup(url)

#created a new variable, open and close bracket means fuction called and result assigned to the html variable.
html=soup.prettify()

#defined another variable, where we are spliting the above by the flass of diff-row evtabrow bc, which is where the horses are split within the HTML
splitperhorse={'class':'diff-row evTabRow bc'}
#and again
horseinfo=soup.find_all('tr',splitperhorse)

-这将数据拆分如下:

 horseinfo[0]

 <tr class="diff-row evTabRow bc" data-best-bks="B3,BF,MK" data-best-dig="1.5" data-bid="26459041728" data-bname="Sharney" data-hcap="" data-hcap-sort="1" data-stall="7"><td class="cardnum">9</td><td class="sel nm has-silks basket-active"><span class="float-wrap"><span class="beta-sprite add-to-bet-basket" data-name="Sharney" data-ng-click="MainController.addToMultipleBetSlip(26459041728, 3490400883, 1.5)" data-track="&amp;lid=grid&amp;lpos=basket-add" title="Add Sharney to betslip"></span></span><img alt="Sharney silk" class="silks" height="29" src="https://static.oddschecker.com/content/racing-silks/24372.gif?v=1.0.15" width="39"/><span class="float-wrap name-wrap"><span class="tcell"><div class="top-row"><a class="popup selTxt" data-name="Sharney" href="https://www.oddschecker.com/horse-racing/chelmsford-city/20:30/top-3-finish/bet-history/sharney" target="_blank" title="View odds history for Sharney">Sharney<span class="stall"> (7)</span></a></div><div class="bottom-row jockey"><span class="current-form">0-40</span></div></span></span></td><td class="bc bs oi b" data-bk="B3" data-fodds="1.9" data-hcap="" data-o="1/2" data-odig="1.5"><p>1/2</p></td><td class="bc bs oi" data-bk="SK" data-fodds="2.0" data-hcap="" data-o="2/5" data-odig="1.4"><p>2/5</p></td><td class="bc bs oi" data-bk="LD" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="WH" data-ew-denom="1" data-ew-places="3" data-fodds="1.83" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="np o" data-bk="EE" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="FB" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="VC" data-fodds="1.91" data-hcap="" data-o="2/5" data-odig="1.4"><p>2/5</p></td><td class="bc bs oi" data-bk="PP" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="np o" data-bk="UN" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="CE" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="FR" data-fodds="1.8" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="WA" data-fodds="1.33" data-hcap="" data-o="2/7" data-odig="1.29"><p>2/7</p></td><td class="bc bs oi" data-bk="SA" data-fodds="1.25" data-hcap="" data-o="2/9" data-odig="1.22"><p>2/9</p></td><td class="bc bs o" data-bk="BY" data-fodds="1.36" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="np o" data-bk="VT" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="OE" data-fodds="1.25" data-hcap="" data-o="2/9" data-odig="1.22"><p>2/9</p></td><td class="np o" data-bk="SO" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="BH" data-fodds="1.25" data-hcap="" data-o="2/9" data-odig="1.22"><p>2/9</p></td><td class="bc bs o" data-bk="GN" data-fodds="1.36" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs o" data-bk="SX" data-ew-denom="0" data-ew-places="0" data-fodds="1.44" data-hcap="" data-o="4/9" data-odig="1.44"><p>4/9</p></td><td class="np o" data-bk="MR" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="wo wo-col"></td><td class="bc bs oi b" data-bk="BF" data-fodds="1.95" data-hcap="" data-o="8/15" data-odig="1.53" data-x-selection="27351256*1.169809702*horse-racing*29739583*1.169809702"><p>8/15</p></td><td class="np o" data-bk="BD" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oo b" data-bk="MK" data-fodds="1.22" data-hcap="" data-o="8/15" data-odig="1.53"><p>8/15</p></td></tr>

-我想做的是进入每行格式一匹马,即。

Sharney Price1 Price2 Price3 Price4

,然后(现在无关紧要)创建要导出的csv。

我正在尝试使用for循环功能,但是我正在努力掌握它。

如果有人可以给我一些指导,我将非常感激。

0 个答案:

没有答案