Question

我正在尝试从网站http://bet.hkjc.com/racing/index.aspx?date=22-01-2017&venue=ST&raceno=1&lang=en中提取表格数据带有马数据的表是我想要提取的。我正在使用这段代码，但它返回一个空数组

page = requests.get("http://bet.hkjc.com/racing/index.aspx?date=22-01-2017&venue=ST&raceno=1&lang=en")
tree = html.fromstring(page.content)
temp = tree.xpath('//*[@id="horseTable"]')
print(temp)

请帮忙！

Answer 1

我认为您应该看一下以下链接中提供的解决方案：

https://chihacknight.org/blog/2014/11/26/an-intro-to-web-scraping-with-python.html

如果您参考代码并对其进行修改以满足您的需求，这是一个非常有用的链接。

Answer 2

我强烈建议您使用 requests 进行抓取，使用 beautifulsoup 进行解析。

以下是一个例子：

import bs4
import requests

content = requests.get("http://bet.hkjc.com/racing/index.aspx?date=22-01-2017&venue=ST&raceno=1&lang=en").text # Get page content
soup = bs4.BeautifulSoup(content, 'lxml') # Parse page content 

table = soup.find('table', {'id': 'horseTable'}) # Locate that table tag
rows = table.find_all('tr') # Find all row tags in that table

for row in rows:
    columns = row.find_all('td') # Find all data tags in each column

    for column in columns:
        print column.text.strip(), # Output data in each column

    print '\n',

输出：

T/P No. Colour Horse Draw Wt. Jockey Trainer Body Wt. Rtg. Gear Last 6 Runs 
 1  HEALTHY LUCK 3 133 D Whyte K L Man  59  1 
 2  GRACE HEART 11 130 C Y Ho (-2) C Fownes  56 B 12/8/9/11/7/7 
 3  CITY LEGEND 7 126 K Teetan T P Yung  52   
 4  GENERAL O'REILLY 4 126 K C Ng (-5) Y S Tsui  52   
 5  JOLLY AMBER 9 126 O Murphy P F Yiu  52 B1  
 6  KEEP MOVING 12 126 N Callan P F Yiu  52   
 7  LUNAR ZEPHYR 13 126 M L Yeung (-2) T P Yung  52 V1  
 8  MERRYGOWIN 5 126 Z Purton P O'Sullivan  52   
 9  VICTORY MUSIC 8 126 T Berry J Moore  52 SR 10 
 10  VITAL SPRING 1 126 J Moreira J Size  52   
 11  PRINCE HARMONY 14 122 S de Sousa W Y So  48 H/P 9/6/10 
 12  FUN MANAGER 6 120 T H So (-2) C H Yip  46  10/9/10/14/8/10 
 13  MASSIVE MOVE 10 118 K K Chiong (-5) L Ho  44 V 3/8/6/3/7/5 
 14  HAPPY SOUND 2 116 H W Lai A Lee  42 E-/B/TT 12/11/11/5/9/1 


Standby Horse 
T/P No.  Horse  Wt.  Trainer Body Wt. Rtg. Gear Last 6 Runs 
 1  BELOVED  131  P F Yiu  57 H 11 
 2  EXPONENTS  124  Y S Tsui  50 B 7/7

表数据未使用python提取

2 个答案: