是否可以通过以下网页提取/反对python的分数和目标:http://www.uscho.com/standings/division-i-men/2011-2012/?我的问题在于表格结构质朴。是否有任何资源可以帮助我解决我的问题?
答案 0 :(得分:2)
您是否尝试过Beautiful Soup?
答案 1 :(得分:1)
mechanize
和BeatifulSoup
答案 2 :(得分:1)
使用lxml解析示例网页非常容易。
这是一个帮助您入门的基本脚本:
from urllib2 import urlopen
from lxml import etree
url = 'http://www.uscho.com/standings/division-i-men/2011-2012/'
tree = etree.HTML(urlopen(url).read())
for section in tree.xpath('//section[starts-with(@id, "section_")]'):
print section.xpath('h3[1]/text()')[0]
for row in section.xpath('table/tbody/tr'):
cols = row.xpath('td//text()')
print ' ', cols[0].ljust(25), ' '.join(cols[1:])
print
输出:
Atlantic Hockey
Air Force 8 2 1 .773 17 40-26 9 4 2 .667 53-36 6 0 1 3 3 1
Mercyhurst 6 1 2 .778 14 21-15 7 7 2 .500 36-49 5 1 1 2 4 1
RIT 5 3 2 .600 12 24-20 6 5 2 .538 30-32 5 2 2 1 3 0
Robert Morris 5 2 1 .688 11 31-20 7 6 1 .536 44-43 3 2 1 3 3 0
Bentley 4 3 2 .556 10 25-18 4 8 3 .367 35-43 1 2 2 3 6 1
Canisius 4 3 2 .556 10 16-17 4 8 3 .367 23-41 2 2 1 2 6 2
Holy Cross 5 4 0 .556 10 28-26 7 7 0 .500 40-47 5 1 0 2 6 0
Niagara 3 2 4 .556 10 25-22 4 5 5 .464 36-39 1 2 2 3 3 3
Connecticut 4 5 1 .450 9 30-24 5 8 2 .400 41-42 3 1 0 1 7 2
American International 2 7 2 .273 6 24-36 3 12 2 .235 35-58 1 4 2 2 8 0
Army 1 5 4 .300 6 20-33 1 7 6 .286 26-47 0 4 2 1 3 3
Sacred Heart 0 10 1 .045 1 30-57 1 14 1 .094 39-86 0 5 1 0 9 0
CCHA
Ohio State 9 2 1 1 .792 29 42-26 12 3 1 .781 53-31 6 1 1 6 2 0
Notre Dame 7 2 3 0 .708 24 36-28 10 5 3 .639 55-50 6 3 0 4 2 3
Western Michigan 6 4 2 2 .583 22 33-28 8 4 4 .625 49-34 5 2 1 3 2 3
Lake Superior 6 5 1 1 .542 20 31-32 10 6 2 .611 46-43 5 3 0 5 3 2
Ferris State 6 5 1 0 .542 19 28-27 10 5 1 .656 43-30 5 1 1 5 4 0
Michigan State 6 4 0 0 .600 18 32-23 10 5 1 .656 56-41 6 1 1 3 3 0
Northern Michigan 4 5 3 2 .458 17 28-31 7 6 3 .531 41-40 6 1 3 1 5 0
Miami 4 6 2 1 .417 15 26-31 8 8 2 .500 48-48 3 3 2 4 5 0
Michigan 4 6 2 1 .417 15 36-32 8 8 2 .500 64-47 7 5 0 1 3 2
Alaska 4 8 2 0 .357 14 26-33 7 9 2 .444 39-41 4 5 1 2 3 1
Bowling Green 1 10 1 1 .125 5 14-41 6 10 2 .389 32-49 3 6 1 3 4 1
D-I Independent
Alabama-Huntsville 0 0 0 .000 0 - 1 15 1 .088 16-67 1 8 1 0 7 0
ECAC
Cornell 6 1 1 .812 13 26-11 7 3 1 .682 32-18 4 1 1 3 1 0
Colgate 6 2 0 .750 12 28-15 11 4 1 .719 55-36 5 2 0 5 2 0
Clarkson 3 4 2 .444 8 19-18 9 6 4 .579 55-37 6 2 0 3 3 4
St. Lawrence 4 5 0 .444 8 16-22 5 10 0 .333 31-52 3 6 0 2 4 0
Union 3 2 2 .571 8 16-13 7 3 5 .633 49-29 1 2 2 6 1 3
Yale 4 2 0 .667 8 19-15 6 4 1 .591 36-31 3 2 0 3 1 0
Dartmouth 3 3 1 .500 7 18-22 4 5 1 .450 24-30 3 3 1 1 2 0
Princeton 3 5 1 .389 7 23-30 4 7 2 .385 30-39 2 2 1 1 4 0
Quinnipiac 2 4 3 .389 7 18-22 9 6 3 .583 57-40 6 1 2 3 5 1
Brown 3 3 0 .500 6 19-20 4 6 1 .409 24-30 2 2 0 1 4 1
Harvard 2 3 2 .429 6 20-21 3 3 3 .500 31-31 2 2 1 1 1 2
Rensselaer 1 6 0 .143 2 8-21 3 12 0 .200 18-42 2 5 0 1 7 0
Hockey East
Boston College 9 3 0 .750 18 45-29 12 5 0 .706 63-42 5 3 0 6 2 0
Boston University 6 4 1 .591 13 37-34 8 5 1 .607 47-43 5 3 0 2 2 1
Merrimack 6 2 1 .722 13 23-18 9 2 1 .792 37-20 4 1 1 5 1 0
Massachusetts-Lowell 6 3 0 .667 12 33-27 9 4 0 .692 46-33 4 1 0 5 2 0
Providence 6 4 0 .600 12 37-29 8 7 1 .531 51-47 7 2 1 1 3 0
Maine 5 5 1 .500 11 37-35 6 6 2 .500 45-44 4 3 0 2 3 2
New Hampshire 4 6 1 .409 9 31-37 6 8 2 .438 56-56 6 2 0 0 6 2
Northeastern 3 7 2 .333 8 31-35 6 7 2 .467 46-39 2 2 1 4 5 1
Massachusetts 2 6 3 .318 7 29-39 4 7 4 .400 47-52 4 0 3 0 7 1
Vermont 1 8 1 .150 3 22-42 3 10 1 .250 33-59 2 5 1 1 5 0
WCHA
Minnesota 10 2 0 .833 20 43-23 13 4 1 .750 75-36 8 1 0 5 3 1
Minnesota-Duluth 9 2 1 .792 19 52-27 11 3 2 .750 66-39 7 3 0 4 0 2
Nebraska-Omaha 6 3 3 .625 15 44-41 8 7 3 .528 60-58 5 2 1 3 4 2
Colorado College 6 4 0 .600 12 44-36 8 4 0 .667 52-38 5 0 0 3 4 0
North Dakota 6 6 0 .500 12 37-35 8 7 1 .531 49-48 5 2 1 3 5 0
Denver 4 3 3 .550 11 39-34 6 5 3 .536 51-44 5 2 2 1 3 1
Michigan Tech 5 6 1 .458 11 36-35 8 7 1 .531 48-43 6 3 1 2 4 0
St. Cloud State 4 5 3 .458 11 36-37 6 8 4 .444 57-58 3 1 3 2 7 1
Bemidji State 4 6 2 .417 10 32-42 6 8 2 .438 43-52 3 2 1 3 6 1
Wisconsin 4 7 1 .375 9 35-43 7 8 1 .469 52-52 7 3 0 0 5 1
Alaska-Anchorage 2 9 1 .208 5 20-47 5 9 2 .375 37-56 2 5 1 1 4 1
Minnesota State 2 9 1 .208 5 34-52 3 12 1 .219 39-64 1 4 1 2 8 0