为了好玩,我试图在我的雅虎梦幻足球联赛中搜集一些玩家交易的数据。这是我第一次使用mechanize和beautifulsoup,我在打印特定数据时遇到问题。我想要提取的是玩家名称,如果他们被添加到“豁免”中。还有日期。我能够得到第一部分,但我不确定如何获得约会。首先是HTML的示例,第二部分是我的代码:
<table class="Table Table-mid Tst-transaction-table">
<tr>
<td class="Grid-u-1-12 Ta-c"><span class="F-icon Block Fz-lg F-positive Cur-h" title="Added Player"></span><span class="F-icon Block Fz-lg F-negative Ptop-med Cur-h" title="Dropped Player"></span></td>
<td class="Fill-x No-pstart" colspan="2">
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/24963" target=sports onclick="pop(this)">Dwayne Harris</a>
<span class="F-position Fz-xxs">NYG - WR</span>
<a href="http://sports.yahoo.com/nfl/players/24963/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="24963" data-ys-playernote-view="notes" target="_blank" id="playernote-'.24963.'"></a> <h6 class="F-shade Fz-xxs"> Waiver </h6></div>
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/6791" target=sports onclick="pop(this)">Benjamin Watson</a>
<span class="F-position Fz-xxs">NO - TE</span>
<a href="http://sports.yahoo.com/nfl/players/6791/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="6791" data-ys-playernote-view="notes" target="_blank" id="playernote-'.6791.'"></a> <h6 class="F-shade Fz-xxs"> To Waivers</h6></div>
</td>
<td class="Ta-end">
<div class="Grid-h-top Nowrap Fz-xxs">
<span class="Grid-u">
<a class="Tst-team-name" href="/f1/313652/10">TeamName2</a>
<span class="Block F-timestamp Fz-xxs Nowrap">Nov 20, 4:03 am</span>
</span>
<a class='Grid-u' href='/f1/313652/10'><img class="Avatar-sm Mstart-med Grid-u" src="http://l.yimg.com/dh/ap/fantasy/nfl/img/icon_01_100.png" alt="avatar"> </a>
</div>
</td>
</tr> <tr>
<td class="Grid-u-1-12 Ta-c"><span class="F-icon Block Fz-lg F-positive Cur-h" title="Added Player"></span><span class="F-icon Block Fz-lg F-negative Ptop-med Cur-h" title="Dropped Player"></span></td>
<td class="Fill-x No-pstart" colspan="2">
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/7306" target=sports onclick="pop(this)">Darren Sproles</a>
<span class="F-position Fz-xxs">Phi - RB</span>
<a href="http://sports.yahoo.com/nfl/players/7306/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="7306" data-ys-playernote-view="notes" target="_blank" id="playernote-'.7306.'"></a> <h6 class="F-shade Fz-xxs">Free Agent </h6></div>
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/24262" target=sports onclick="pop(this)">Joique Bell</a>
<span class="F-position Fz-xxs">Det - RB</span>
<span class="F-injury Fz-xxs" title="Probable">P</span>
<a href="http://sports.yahoo.com/nfl/players/24262/news" class="yfa-icon playernote playernote-old" data-ys-playerid="24262" data-ys-playernote-view="notes" target="_blank" id="playernote-'.24262.'"></a> <h6 class="F-shade Fz-xxs"> To Waivers</h6></div>
</td>
<td class="Ta-end">
<div class="Grid-h-top Nowrap Fz-xxs">
<span class="Grid-u">
<a class="Tst-team-name" href="/f1/313652/3">TeamName1</a>
<span class="Block F-timestamp Fz-xxs Nowrap">Nov 19, 1:30 pm</span>
</span>
<a class='Grid-u' href='/f1/313652/3'><img class="Avatar-sm Mstart-med Grid-u" src="http://l.yimg.com/dh/ap/fantasy/img/profile_48.png" alt="avatar"> </a>
</div>
</td>
代码:
import mechanize
from bs4 import BeautifulSoup
import urllib
username = 'my-username'
password = 'my-password'
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6')]
br.open("https://football.fantasysports.yahoo.com/f1/313652/transactions")
br.select_form(nr=0)
br.form["username"] = username
br.form["passwd"] = password
response = br.submit()
html_scrape = response.read()
soup = BeautifulSoup(html_scrape, "lxml")
for lines in soup.find_all('div', attrs={'class': 'Pbot-xs'}):
players = lines.find('a').get_text()
status = lines.find('h6').get_text()
if (status == ' To Waivers'):
print "%s was dropped" % players
我已尝试在表格中使用find()函数,但我无法弄清楚如何获取我正在寻找的文本数据。
谢谢!
答案 0 :(得分:0)
如果不熟悉雅虎梦幻足球页面,给你一个正确的答案是很难的,但我可以告诉你,如果你想要指出的话。针对特定div的BeautifulSoup,您应该使用BeautifulSoup的select
功能:
for lines in soup.select("div#pBot-xs"):
players = lines.find('a').text
status = lines.find('h6').text
if status == 'To Waivers':
print "%s was dropped." % players
答案 1 :(得分:0)
这有点棘手,因为在幻想体育中,你可以放弃一个玩家,但不一定要添加玩家。我通过浏览列表并按顺序添加玩家姓名和日期来解决这个问题。玩家必须匹配“To Waivers”属性。然后我设置一个try / catch块以确保迭代中的前一个对象具有相应的Player。这将确保我的字典值为Player&gt;日期&gt;播放器&gt;日期等
然后我遍历字典并按照我想要的方式格式化打印:
parse