Python BeautifulSoup解析表Yahoo Fantasy Football数据

时间:2015-11-20 17:00:39

标签: python python-2.7 beautifulsoup mechanize

为了好玩,我试图在我的雅虎梦幻足球联赛中搜集一些玩家交易的数据。这是我第一次使用mechanize和beautifulsoup,我在打印特定数据时遇到问题。我想要提取的是玩家名称,如果他们被添加到“豁免”中。还有日期。我能够得到第一部分,但我不确定如何获得约会。首先是HTML的示例,第二部分是我的代码:

        <table class="Table Table-mid Tst-transaction-table">
                <tr>
        <td class="Grid-u-1-12 Ta-c"><span class="F-icon Block Fz-lg F-positive Cur-h" title="Added Player">&#xe035;</span><span class="F-icon Block Fz-lg F-negative Ptop-med Cur-h" title="Dropped Player">&#xe033;</span></td>
        <td class="Fill-x No-pstart" colspan="2">
            <div class="Pbot-xs">        <a href="http://sports.yahoo.com/nfl/players/24963" target=sports onclick="pop(this)">Dwayne Harris</a>
        <span class="F-position Fz-xxs">NYG - WR</span>
        <a href="http://sports.yahoo.com/nfl/players/24963/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="24963" data-ys-playernote-view="notes" target="_blank" id="playernote-'.24963.'"></a>  <h6 class="F-shade Fz-xxs"> Waiver </h6></div>
                    <div class="Pbot-xs">        <a href="http://sports.yahoo.com/nfl/players/6791" target=sports onclick="pop(this)">Benjamin Watson</a>
        <span class="F-position Fz-xxs">NO - TE</span>
        <a href="http://sports.yahoo.com/nfl/players/6791/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="6791" data-ys-playernote-view="notes" target="_blank" id="playernote-'.6791.'"></a>  <h6 class="F-shade Fz-xxs"> To Waivers</h6></div>
        </td>
        <td class="Ta-end">
            <div class="Grid-h-top Nowrap Fz-xxs">
    <span class="Grid-u">
      <a class="Tst-team-name" href="/f1/313652/10">TeamName2</a> 

      <span class="Block F-timestamp Fz-xxs Nowrap">Nov 20, 4:03 am</span>
    </span>
    <a class='Grid-u' href='/f1/313652/10'><img class="Avatar-sm Mstart-med Grid-u" src="http://l.yimg.com/dh/ap/fantasy/nfl/img/icon_01_100.png" alt="avatar"> </a>
</div>
        </td>
    </tr>    <tr>
        <td class="Grid-u-1-12 Ta-c"><span class="F-icon Block Fz-lg F-positive Cur-h" title="Added Player">&#xe035;</span><span class="F-icon Block Fz-lg F-negative Ptop-med Cur-h" title="Dropped Player">&#xe033;</span></td>
        <td class="Fill-x No-pstart" colspan="2">
            <div class="Pbot-xs">        <a href="http://sports.yahoo.com/nfl/players/7306" target=sports onclick="pop(this)">Darren Sproles</a>
        <span class="F-position Fz-xxs">Phi - RB</span>
        <a href="http://sports.yahoo.com/nfl/players/7306/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="7306" data-ys-playernote-view="notes" target="_blank" id="playernote-'.7306.'"></a>  <h6 class="F-shade Fz-xxs">Free Agent </h6></div>
                    <div class="Pbot-xs">        <a href="http://sports.yahoo.com/nfl/players/24262" target=sports onclick="pop(this)">Joique Bell</a>
        <span class="F-position Fz-xxs">Det - RB</span>
         <span class="F-injury Fz-xxs" title="Probable">P</span>
        <a href="http://sports.yahoo.com/nfl/players/24262/news" class="yfa-icon playernote playernote-old" data-ys-playerid="24262" data-ys-playernote-view="notes" target="_blank" id="playernote-'.24262.'"></a>  <h6 class="F-shade Fz-xxs"> To Waivers</h6></div>
        </td>
        <td class="Ta-end">
            <div class="Grid-h-top Nowrap Fz-xxs">
    <span class="Grid-u">
      <a class="Tst-team-name" href="/f1/313652/3">TeamName1</a> 
      <span class="Block F-timestamp Fz-xxs Nowrap">Nov 19, 1:30 pm</span>
    </span>
    <a class='Grid-u' href='/f1/313652/3'><img class="Avatar-sm Mstart-med Grid-u" src="http://l.yimg.com/dh/ap/fantasy/img/profile_48.png" alt="avatar"> </a>
</div>
        </td>

代码:

import mechanize
from bs4 import BeautifulSoup
import urllib

username = 'my-username'
password = 'my-password'

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6')]
br.open("https://football.fantasysports.yahoo.com/f1/313652/transactions")
br.select_form(nr=0)
br.form["username"] = username
br.form["passwd"] = password
response = br.submit()
html_scrape = response.read()
soup = BeautifulSoup(html_scrape, "lxml")

for lines in soup.find_all('div', attrs={'class': 'Pbot-xs'}):
    players = lines.find('a').get_text()
    status = lines.find('h6').get_text()
    if (status == ' To Waivers'):
        print "%s was dropped" % players

我已尝试在表格中使用find()函数,但我无法弄清楚如何获取我正在寻找的文本数据。

谢谢!

2 个答案:

答案 0 :(得分:0)

如果不熟悉雅虎梦幻足球页面,给你一个正确的答案是很难的,但我可以告诉你,如果你想要指出的话。针对特定div的BeautifulSoup,您应该使用BeautifulSoup的select功能:

for lines in soup.select("div#pBot-xs"):
    players = lines.find('a').text
    status = lines.find('h6').text
    if status == 'To Waivers':
        print "%s was dropped." % players

答案 1 :(得分:0)

这有点棘手,因为在幻想体育中,你可以放弃一个玩家,但不一定要添加玩家。我通过浏览列表并按顺序添加玩家姓名和日期来解决这个问题。玩家必须匹配“To Waivers”属性。然后我设置一个try / catch块以确保迭代中的前一个对象具有相应的Player。这将确保我的字典值为Player&gt;日期&gt;播放器&gt;日期等

然后我遍历字典并按照我想要的方式格式化打印:

parse