我正在尝试使用以下代码。但我得到的第一个选定日期的数据是相同数据的5倍。
import re
import datetime
import mechanicalsoup
def get_EminiTable(soup):
allValues = []
colnames = soup.find('thead').get_text()
allValues.append([i for i in colnames.split('\n') if len(i) > 1])
lnr = 0
for line in soup.tbody.find_all('tr'):
allValues.append([i for i in line.get_text().split('\n') if len(i) > 0])
if 'UNCH' in allValues:
allValues.pop()
if lnr > 1: break
lnr += 1
return allValues
def get_settldays(soup):
settlDays = soup.find('select', id="cmeTradeDate").get
settlDays = re.findall('\d\d/\d\d/\d\d\d\d',str(settlDays))
return [datetime.datetime.strptime(adat, '%m/%d/%Y') for adat in settlDays]
url = "http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500_quotes_settlements_futures.html"
browser = mechanicalsoup.StatefulBrowser()
if str(browser.open(url)) != '<Response [200]>':
print('Error')
quit()
soup = browser.get_current_page()
settlDays = get_settldays(soup)
for adate in settlDays:
form = browser.select_form('form[id="quotesoptionsform1"]')
form.set("tradeDate",adate.__format__('%m/%d/%Y'))
browser.submit_selected()
soup = browser.get_current_page()
tabvals = get_EminiTable(soup)
print(adate)
for each in tabvals:
print(each)
browser.session.close()
browser.close()
任何想法得到不同表的正确日期,或者它是机械汤中的错误。
答案 0 :(得分:0)
我怀疑原因是因为这个表单是由JavaScript处理的,而不是HTTP处理的,所以提交带有HTTP请求的表单(这就是MechanicalSoup的工作方式)并没有真正做任何事情。
MechanicalSoup使用以下语法创建请求URL:
如果单击此链接,则可以看到忽略查询字符串&action=Submit&tradeDate=12%2F15%2F2017
。然而,如果我们在真实的网络浏览器中手动设置交易日期,我们会看到它将#tradeDate=12/15/2017
附加到网址。
即使我从Web浏览器中获取该URL并使用MechanicalSoup打开它,它也无法正确显示,因为JavaScript尚未处理。您可以通过以下方式修改最后一个for循环来看到这一点:
for adate in settlDays:
# Open the URL for each date directly
date = adate.__format__('%m/%d/%Y')
resp = browser.open(url + "#tradeDate={}".format(date))
# Make sure we constructed the URL correctly
print(resp.url)
# Print the date that is being displayed
soup = browser.get_current_page()
print(soup.find('select', id='cmeTradeDate').find('option', attrs={'selected': 'selected'}).text)
输出看起来像(忽略语法高亮...):
http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500_quotes_settlements_futures.html#tradeDate=12/21/2017
Thursday, 21 Dec 2017 (Prelim)
http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500_quotes_settlements_futures.html#tradeDate=12/20/2017
Thursday, 21 Dec 2017 (Prelim)
http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500_quotes_settlements_futures.html#tradeDate=12/19/2017
Thursday, 21 Dec 2017 (Prelim)
http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500_quotes_settlements_futures.html#tradeDate=12/18/2017
Thursday, 21 Dec 2017 (Prelim)
http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500_quotes_settlements_futures.html#tradeDate=12/15/2017
Thursday, 21 Dec 2017 (Prelim)
除非有人知道用HTTP处理此页面的替代方法,否则我认为最好的办法是使用可与JavaScript交互的工具,例如模拟真实浏览器的Selenium。