Question

我看到了一些用于内容抓取的代码（https://docs.python-guide.org/scenarios/scrape/）。

我正在开发一个基于Selenium的程序，该程序转到https://www.earningswhispers.com/calendar，并从该网站上抓取所有数据。我可以使用驱动程序find_element_by_id('epscalendar')来获取所需的数据。

但是当我尝试使用相同的方法 companyNames = tree.xpath('//*[@id="epscalendar"]')会显示Company: []

我的硒脚本

import csv
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from datetime import date, time, datetime

driver = webdriver.Chrome()

driver.maximize_window()
driver.get("https://www.earningswhispers.com/calendar")

earningsList = driver.find_element_by_id('epscalendar')
dataOutput = earningsList.text

if datetime.now().strftime("%H%M%S") >= "090000":
    fileName = 'AMC_earnings_{}'.format(date.today())
elif datetime.now().strftime("%H%M%S") < "090000":
    fileName = 'BMO_earnings_{}'.format(date.today())

with open(fileName + ".csv", mode='w') as csv_file:
    writer = csv.writer(csv_file)
    finalOutput = re.sub(r"Beat\nMeet\nMiss", "\n", dataOutput)
    # finalOutput = re.sub(r"Company\nEstimate\nActual\nGrowth\nGuidance\nScore\nSurprise\n", "\n", dataOutput)
    writer.writerow([finalOutput])

print(finalOutput)
driver.close()

HTML文字搜寻器代码

from lxml import html
import requests

page = requests.get('https://www.earningswhispers.com/calendar')
tree = html.fromstring(page.content)

companyNames = tree.cssselect('div.company')

print 'Company: ', companyNames

尝试在Earnings Whisper Calendar页面上抓取数据，但找不到我需要ping的元素

0 个答案: