我正在尝试从网站http://bet.hkjc.com/racing/index.aspx?date=22-01-2017&venue=ST&raceno=1&lang=en中提取表格数据 带有马数据的表是我想要提取的。我正在使用这段代码,但它返回一个空数组
page = requests.get("http://bet.hkjc.com/racing/index.aspx?date=22-01-2017&venue=ST&raceno=1&lang=en")
tree = html.fromstring(page.content)
temp = tree.xpath('//*[@id="horseTable"]')
print(temp)
请帮忙!
答案 0 :(得分:0)
我认为您应该看一下以下链接中提供的解决方案:
https://chihacknight.org/blog/2014/11/26/an-intro-to-web-scraping-with-python.html
如果您参考代码并对其进行修改以满足您的需求,这是一个非常有用的链接。
答案 1 :(得分:0)
我强烈建议您使用 requests 进行抓取,使用 beautifulsoup 进行解析。
以下是一个例子:
import bs4
import requests
content = requests.get("http://bet.hkjc.com/racing/index.aspx?date=22-01-2017&venue=ST&raceno=1&lang=en").text # Get page content
soup = bs4.BeautifulSoup(content, 'lxml') # Parse page content
table = soup.find('table', {'id': 'horseTable'}) # Locate that table tag
rows = table.find_all('tr') # Find all row tags in that table
for row in rows:
columns = row.find_all('td') # Find all data tags in each column
for column in columns:
print column.text.strip(), # Output data in each column
print '\n',
输出:
T/P No. Colour Horse Draw Wt. Jockey Trainer Body Wt. Rtg. Gear Last 6 Runs
1 HEALTHY LUCK 3 133 D Whyte K L Man 59 1
2 GRACE HEART 11 130 C Y Ho (-2) C Fownes 56 B 12/8/9/11/7/7
3 CITY LEGEND 7 126 K Teetan T P Yung 52
4 GENERAL O'REILLY 4 126 K C Ng (-5) Y S Tsui 52
5 JOLLY AMBER 9 126 O Murphy P F Yiu 52 B1
6 KEEP MOVING 12 126 N Callan P F Yiu 52
7 LUNAR ZEPHYR 13 126 M L Yeung (-2) T P Yung 52 V1
8 MERRYGOWIN 5 126 Z Purton P O'Sullivan 52
9 VICTORY MUSIC 8 126 T Berry J Moore 52 SR 10
10 VITAL SPRING 1 126 J Moreira J Size 52
11 PRINCE HARMONY 14 122 S de Sousa W Y So 48 H/P 9/6/10
12 FUN MANAGER 6 120 T H So (-2) C H Yip 46 10/9/10/14/8/10
13 MASSIVE MOVE 10 118 K K Chiong (-5) L Ho 44 V 3/8/6/3/7/5
14 HAPPY SOUND 2 116 H W Lai A Lee 42 E-/B/TT 12/11/11/5/9/1
Standby Horse
T/P No. Horse Wt. Trainer Body Wt. Rtg. Gear Last 6 Runs
1 BELOVED 131 P F Yiu 57 H 11
2 EXPONENTS 124 Y S Tsui 50 B 7/7