我看到了一些用于内容抓取的代码(https://docs.python-guide.org/scenarios/scrape/)。
我正在开发一个基于Selenium的程序,该程序转到https://www.earningswhispers.com/calendar,并从该网站上抓取所有数据。我可以使用驱动程序find_element_by_id('epscalendar')
来获取所需的数据。
但是当我尝试使用相同的方法
companyNames = tree.xpath('//*[@id="epscalendar"]')
会显示Company: []
我的硒脚本
import csv
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from datetime import date, time, datetime
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.earningswhispers.com/calendar")
earningsList = driver.find_element_by_id('epscalendar')
dataOutput = earningsList.text
if datetime.now().strftime("%H%M%S") >= "090000":
fileName = 'AMC_earnings_{}'.format(date.today())
elif datetime.now().strftime("%H%M%S") < "090000":
fileName = 'BMO_earnings_{}'.format(date.today())
with open(fileName + ".csv", mode='w') as csv_file:
writer = csv.writer(csv_file)
finalOutput = re.sub(r"Beat\nMeet\nMiss", "\n", dataOutput)
# finalOutput = re.sub(r"Company\nEstimate\nActual\nGrowth\nGuidance\nScore\nSurprise\n", "\n", dataOutput)
writer.writerow([finalOutput])
print(finalOutput)
driver.close()
HTML文字搜寻器代码
from lxml import html
import requests
page = requests.get('https://www.earningswhispers.com/calendar')
tree = html.fromstring(page.content)
companyNames = tree.cssselect('div.company')
print 'Company: ', companyNames