Question

我正在尝试从以下网址中提取历史和预测的每小时能源价格：https://hourlypricing.comed.com/pricing-table-today/

我能够为此处的其他表格执行此操作，即明天的预测价格https://hourlypricing.comed.com/pricing-table-tomorrow/

...到目前为止，下拉菜单的处理还是有点麻烦。

我不完全了解如何使用日期选择器完成此操作。我想做的是提取2018年全年的数据。当我使用Selenium IDE记录要采取的步骤时在录制模式下根本不增加年份，但是在不录制的情况下更改日期时效果很好？任何有关如何解决此问题的指针将不胜感激。到目前为止，据我了解，我应该能够在IDE中记录命令，而不是在python中编写相同的代码？

from pandas.io.html import read_html
from selenium import webdriver
from operator import itemgetter
#driver = webdriver.Firefox()
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument('headless')

driver = webdriver.Chrome(chrome_options=options)

driver.get('https://hourlypricing.comed.com/pricing-table-tomorrow/')

table = driver.find_element_by_class_name('prices')
tablehtml = table.get_attribute('outerHTML')
soup = BeautifulSoup(tablehtml,'xml')
table = soup.find("table", { "class" : "prices" })
#print(table)
table_body = table.find('tbody')
#print(table_body)

data = []
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    cents = cols[1]
    cents = cents[:-1]
    cols[1] = cents
    data.append([ele for ele in cols if ele])

sortedData = sorted(data, key=itemgetter(1))


pprint(sortedData)

driver.close()

Answer 1

不必花费很多时间去选择日历并选择每一天。相反，您可以直接转到信息的来源，将fetch（）的输出解析为漂亮的汤，然后检索您想要的所有信息：）

我们正在计算一个月中有多少天，并将该列表传递到GET请求中以检索该天。全部在12个月内循环。您可以根据需要将其调整为以前的几年。

import requests
import calendar

def getDays(counter):

  b = calendar.monthcalendar(2018, counter)

  length = len(b)
  lengthCounter = 0
  days = []
  for x in b:
    lists = (b[lengthCounter])
    lengthCounter += 1
    for day in lists:
      if day > 0:
        days.append(day)
    else:
      pass
  return(days)

def fetch(days, month):
  if month < 10:
    month = "0" + str(month)

  for d in days:
    if d < 10:
        mod = "0" + str(d)
        re = requests.get("https://hourlypricing.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=2018" + str(month) + str(mod))
        source = re.content
        print(source)
    else:
      re = requests.get("https://hourlypricing.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=2018" + str(month) + str(d))
      source = re.content
      print(source)




months = 1
while months < 12:

    dayList = getDays(months)
    print(fetch(dayList, months))
    months +=1

Answer 2

有免费的API，可获取历史价格信息。它允许您指定要检索其值的范围。这是5分钟的价格，但有多种查询选项和不同的返回格式

GET请求返回json的示例数据范围格式

https://hourlypricing.comed.com/api?type=5minutefeed&datestart=201712310000&dateend=201812310000

提供的日期格式为：yyyyMMddhhmm

此处的API信息

https://hourlypricing.comed.com/hp-api/

JSON：返回一个数组，其中包含元素UTC millis和价格。

[
{"millisUTC":"1434686700000","price":"2.0"},
{"millisUTC":"1434686100000″,"price”:"2.5"},
{"millisUTC":"1434685800000″,"price”:"2.5"}
]

Answer 3

我不完全了解如何使用日期选择器完成此操作。

其他人通过解决日期选择器来提及解决方案，这是最好的-如果可能的话。但是，如果您需要automate datepickers with selenium ide++ see here。这种OCR方法对我来说效果很好，并且可以快速实施。

打开日期选择器
将计算机视觉区域限制为日期控制区域
让IDE查找并单击日期e的数字。 G。 XClick | OCR=text

如何使用带有下拉菜单的硒抓取历史数据？

3 个答案: