此代码的目的是从某个URL抓取多页数据表。而且仅在第一行就可以了。
代码如下:
from selenium import webdriver
class DataEngine:
def __init__(self):
self.url = 'https://www.investing.com/economic-calendar/house-price-index-147'
self.driver = webdriver.PhantomJS(r"D:\Projects\Tutorial\Driver\phantomjs-2.1.1-windows\bin\phantomjs.exe")
def title(self):
self.driver.get(self.url)
title = self.driver.find_elements_by_xpath('//*[@id="leftColumn"]/h1')
for title in title:
print(title.text)
def table(self):
self.driver.get(self.url)
while True:
table = self.driver.find_elements_by_xpath('//*[@id="historicEvent_372690"]')
for table in table:
print(table.text)
答案 0 :(得分:0)
要确保您的代码抓取页面上的所有行,请更新xpath
//*[@id="historicEvent_372690"]
到
//*[contains(@id,"historicEvent_")]
您当前使用的xpath仅读取第一行。我共享的xpath使用contains关键字查找包含id historicEvent_