Question

目前，我正在尝试从以下网站抓取所有价目表： http://aeroportos.weebly.com/fuel-prices.html#.W7SatGj7Sbj

但是，在尝试在xpath中查找表时遇到了一些问题。另外，我不确定是否可以在一个脚本中抓取所有表，还是必须手动检查它们？

catch

Answer 1

答案在于编写正确的xpath，该xpath可以拾取页面中所有表中所有包含数据的行（不包含标题）。

下面的代码应该可以正常工作：

def get_prices():
    url = "http://aeroportos.weebly.com/fuel-prices.html#.W7SM3mj7Sbj"
    driver = webdriver.Firefox()
    driver.implicitly_wait(30)
    driver.get(url)
    rows = driver.find_element_by_xpath('//*[contains(text(), "Airport")]/ancestor::tr/following-sibling::tr')
    prices = []
    for row in rows:
        cells = row.find_elements_by_tag_name('td')
        region = cells[0].text
        country = cells[1].text
        code = cells[2].text
        name = cells[3].text
        price = cells[4].text
        prices.append(region, country, code, name, price)
    print(prices)

注意：我没有执行代码，但是应该可以正常工作。谢谢。

用硒刮擦不同的桌子

1 个答案: