我正尝试从以下日期(2020年2月1日至2020年2月5日)从此网站https://www.investing.com/economic-calendar/获取历史经济日历数据。
今天是2020年2月4日。
如果我使用下面的https://www.investing.com/economic-calendar/网址,则可以使用beautifulsoup提取表格,但是除当前日期外,我无法选择其他日期。我今天(2020年2月4日)在python脚本中保存了一张表。
import requests
import pandas as pd
from bs4 import BeautifulSoup
payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
"dateFrom":"2020-02-01",
"dateTo":"2020-02-05",
"timeZone":"8",
"timeFilter":"timeRemain",
"currentTab":"custom",
"limit_from":"0"}
urlheader = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
url = "https://www.investing.com/economic-calendar/"
req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")
我可以看到,每当我更改日期范围或过滤器设置时,它都会向“ https://www.investing.com/economic-calendar/Service/getCalendarFilteredData”发送发布请求。
这是我找到的请求数据。
这是POST链接
所以我改用下面的代码,因为我想选择日期。
import requests
import pandas as pd
from bs4 import BeautifulSoup
payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
"dateFrom":"2020-02-01",
"dateTo":"2020-02-05",
"timeZone":"8",
"timeFilter":"timeRemain",
"currentTab":"custom",
"limit_from":"0"}
urlheader = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"
req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")
但是这一次,没有economicCalendarData,因此表变量显示为空。 汤变量中有数据,但其中没有表数据。
这是我要保存的表。
就像我之前说的那样,如果将URL用作https://www.investing.com/economic-calendar/,则只能获取当天(2020年2月4日)的表格数据;无论我输入有效负载的日期是什么(dateFrom,dateTo)。
由于某种原因,当我尝试发布到https://www.investing.com/economic-calendar/Service/getCalendarFilteredData时,表变成空的,即使汤变量包含数据,也不是我请求的数据。我究竟做错了什么?如何在选择的日期保存表格?
答案 0 :(得分:2)
您真的很亲密。如果我了解您的要求,以下内容将带您到达这里:
import requests
from bs4 import BeautifulSoup
url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"
payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
"dateFrom":"2020-02-01",
"dateTo":"2020-02-05",
"timeZone":"8",
"timeFilter":"timeRemain",
"currentTab":"custom",
"limit_from":"0"}
req = requests.post(url, data=payload, headers={
"User-Agent":"Mozilla/5.0",
"X-Requested-With": "XMLHttpRequest"
})
soup = BeautifulSoup(req.json()['data'],"lxml")
for items in soup.select("tr"):
data = [item.get_text(strip=True) for item in items.select("th,td")]
print(data)