当天的第二个问题。
这是我到目前为止编写的代码。
我正在尝试从此表中提取Settl.Prices和Vol.Exchange列:https://www.eex.com/en/market-data/power/futures/phelix-at-futures#!/2018/7/3
该行中的结果是一团糟,我尝试使用re.sub
使其更好,但是我无法保留数字,逗号和点,也不会丢失位置和数字小数点分隔符。关于如何将两列存储在两个列表中的任何想法?
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.common.by import By
import datetime
import time
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
today=datetime.date.today()
browser = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe")
my_url = 'https://www.eex.com/en/market-data/power/futures/phelix-at-futures#!/'+str(today.year)+'/'+str(today.month)+'/'+str(today.day-1)
browser.get(my_url)
button = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "ul.tabs.filter_wrap.clearfix li.ng-scope:nth-child(3)>a"))).click()
page_html = browser.page_source
page_soup = soup(page_html, "html.parser")
browser.close()
time.sleep(5)
table = page_soup.find('table')
table_rows = table.findAll('tr')
for tr in table_rows:
list = ""
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
实际输出
['\n Cal-19\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n 0.51\n ', '\n -\n ', '\n -\n ', '\n 46.15\n ', '\n -\n ', '\n -\n ', '\n 12\n ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n Cal-20\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n 0.54\n ', '\n -\n ', '\n -\n ', '\n 44.62\n ', '\n -\n ', '\n -\n ', '\n 1\n ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n Cal-21\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n 0.65\n ', '\n -\n ', '\n -\n ', '\n 43.70\n ', '\n -\n ', '\n -\n ', '\n -\n ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n Cal-22\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n 0.55\n ', '\n -\n ', '\n -\n ', '\n 45.08\n ', '\n -\n ', '\n -\n ', '\n -\n ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n Cal-23\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n 0.55\n ', '\n -\n ', '\n -\n ', '\n 45.85\n ', '\n -\n ', '\n -\n ', '\n -\n ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n Cal-24\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n -\n ', '\n 0.53\n ', '\n -\n ', '\n -\n ', '\n 46.83\n ', '\n -\n ', '\n -\n ', '\n -\n ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
想要的输出
46.15,- (the one from the column adjacent)
44.62,-
43.70,-
45.08,-
45.85,-
46.83,-
答案 0 :(得分:2)
与其只是尝试使用整个页面并从表中的html表数据中获取值,不如只是使用正确的参数调用API会容易得多。
API如下:
https://www.eex.com/data//view/data/detail/ws-power-futures-austrian-v1/{year}/{month}.{day}.json
示例:
https://www.eex.com/data//view/data/detail/ws-power-futures-austrian-v1/2018/06.07.json
它获取一个JSON,然后您可以根据需要操纵其中的数据,基本上可以使用pandas使用适当的值来构建数据框架。似乎比直接浏览页面更简单的解决方案,而且您的值不会有任何问题。
以下一些链接可以帮助您阅读JSON:
解析JSON:Parsing values from a JSON file?
JSON到熊猫DF:JSON to pandas DataFrame
更新:
我写了一段代码,应该可以帮助您理解这个想法:
from urllib.request import Request, urlopen
import json
request=Request('https://www.eex.com/data//view/data/detail/ws-power-futures-austrian-v1/2018/06.07.json')
response = urlopen(request)
data = response.read()
d = json.loads(data)
# this first obj corresponds to : P-Power-F-AT-Peak-Quarter
first_obj = d["data"][0]["rows"]
values = []
for row in first_obj:
if('settlementPrice' in row["data"]):
sp = row["data"]["settlementPrice"]
values.append(sp)
print(values)
提取的JSON如下所示:
{
"data": [
{
"identifier": "P-Power-F-AT-Peak-Quarter",
"rows": [
{
"data" : {'param1': value, 'param2': value, ...},
"contractIdentifier": value,
},
{
"data" : {'param1': value, 'param2': value, ...},
"contractIdentifier": value,
},
...
]
},
{
"identifier": "P-Power-F-AT-Peak-Month",
"rows": [
{
"data" : {'param1': value, 'param2': value, ...},
"contractIdentifier": value,
},
{
"data" : {'param1': value, 'param2': value, ...},
"contractIdentifier": value,
},
...
]
},
{
"identifier": "P-Power-F-AT-Base-Year",
"rows": [
{
"data" : {'param1': value, 'param2': value, ...},
"contractIdentifier": value,
},
{
"data" : {'param1': value, 'param2': value, ...},
"contractIdentifier": value,
},
...
]
},
...
我打印出的结果如下:
[53.36, 63.86, 62.63, 46.83, 47.44, 59.28, 58.7]
因此,基本上,您要做的是加载JSON,对其进行解析并存储要从中获取数据的对象。在我给您的代码示例中,我取得了第一个对象,该对象位于索引“ 0”处,该对象对应于2018年6月7日的标识符“ P-Power-F-AT-Peak-Quarter”(网址字符串中的此参数)。您可以通过解析“ d ['data']”中的数据并停止要从中获取值的标识符值来选择要获取的对象。
如果您想知道参数名称是什么,只需在浏览器中打开URL或下载JSON文件并在您喜欢的编辑器中打开它即可。
希望这会有所帮助。