我正在尝试以html格式输入自定义日期范围,以抓取指定范围内的数据。 HTML代码如下所示:
<div class="dateRange inlineblock datePickerBinder arial_11 lightgrayFont"
id="widgetFieldDateRange">03/19/2019 - 04/18/2019</div>
</div>
<input id="picker" type="hidden" value=" 03/19/2019 - 04/18/2019">
我尝试了以下操作:
import requests
import urllib.parse as urlParse
url = 'https://www.investing.com/funds/lansforsakringar-global-indexnara-historical-data'
values = {'start':'01/18/2019','end':'04/18/2019'}
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {
"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36",
"Accept" : "text/plain, */*; q=0.01",
"Content-Type" : "application/x-www-form-urlencoded",
"X-Requested-With" : "XMLHttpRequest"
}
# encode values for the url
params = urlParse.urlencode(values).encode("utf-8")
# create the url
s = requests.Session()
targetUrl = s.post(url=url, data=params, headers=headers)
# open the url
html = BeautifulSoup(targetUrl.content, "html.parser")
# read the response
print(html.prettify)
但是在打印响应时,我看到设置了默认日期范围,并且尚未应用我的自定义日期范围。我该如何解决这个问题?
也发现了这一点,我相信发布日期的javascript
<script type="text/javascript">
window.siteData = {
htmlDirection: 'ltr',
decimalPoint: '.' || '.',
thousandSep: ',' || ',',
isEu : false,
userLoggedIn: false,
userHasPhoneRegistered: false,
currencyPosition: 'left',
datepicker: {
applyButton: 'Apply',
format: 'm/d/Y',
formatShort: 'm/d/y',
formatLong: 'm/d/Y',
formatSend: 'yy-mm-dd',
firstDay: '1',
dayNames: ["Su","Mo","Tu","We","Th","Fr","Sa"],
monthNamesShort: ["Jan.", "Feb.", "Mar.", "Apr.", "May", "Jun.", "Jul.", "Aug.", "Sep.", "Oct.", "Nov.", "Dec."],
monthNames: ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"],
translations: {
custom: 'Custom dates',
start: 'Start Date',
end: 'End Date'
}
答案 0 :(得分:1)
以下内容应帮助您单击日历菜单并使用Selenium输入值。该页面有一个Ajax POST,但我无法传递正确的cookie(我认为)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
d = webdriver.Chrome()
d.get('https://www.investing.com/funds/lansforsakringar-global-indexnara-historical-data')
try: #attempt to dismiss banners that could block later clicks
WebDriverWait(d, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".closer"))).click()
d.find_element_by_css_selector('.closer').click()
except:
pass
d.find_element_by_id('widgetFieldDateRange').click() #show the date picker
sDate = d.find_element_by_id('startDate') # set start date input element into variable
sDate.clear() #clear existing entry
sDate.send_keys('01/18/2019') #add custom entry
eDate = d.find_element_by_id('endDate') #repeat for end date
eDate.clear()
eDate.send_keys('04/18/2019')
d.find_element_by_id('applyBtn').click() #submit changes
答案 1 :(得分:0)
您可以使用selenium
:
from selenium import webdriver
from bs4 import BeautifulSoup as soup
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.investing.com/funds/lansforsakringar-global-indexnara-historical-data')
r = soup(d.page_source, 'html.parser').find('div', {'id':'widgetFieldDateRange'}).text
输出:
'03/18/2019 - 04/18/2019'