我一直在尝试使用“请求”模块从网站上访问.txt文件。当我手动使用用户名和密码登录时,可以在浏览器中查看真实数据。
Point Code Issue Date Trade Date Region Pricing Point Low High Average Volume Deals Delivery Start Date Delivery End Date
RMTNWW 2018-10-09 2018-10-08 Rocky Mountains Northwest Wyoming Pool 2.910 2.955 2.935 323 44 2018-10-09 2018-10-09
RMTOPAL 2018-10-09 2018-10-08 Rocky Mountains Opal 2.925 3.050 2.965 209 40 2018-10-09 2018-10-09
但是当我尝试使用脚本访问同一页面并使用以下内容打印内容时
print(page.content)
输出显示为html源:
b'<!DOCTYPE html>\n<html>\n<head>\n\n<meta name="csrf-param" content="authenticity_token"/>\n<meta name="csrf-token" content="s35g4TAUN6+5V8Xi8x7u6f2FwziX3gbW9iY9D45HnEw="/>\n<meta http-equiv="content-type" content="text/html;charset=utf-8">
\n<meta name="description" content="Natural Gas Intelligence">\n<meta name="keywords" content="gas, natural gas, natural gas prices, enery prices, NYMEX, nymex settlement, aga, storage, natural gas data, henry hub, ferc, power, electricity, electric, megawatt, methane, reliability, inside, ngi">\n\n\n\n<meta content="false" name="has-log-view" />\n<!--<meta content="IE=EmulateIE7" http-equiv="X-UA-Compatible"/>
.
.
.
此HTML内的所有内容都没有上面显示的任何标签(点代码,发布日期等),因此我认为这可能是登录问题。登录URL为https://www.naturalgasintel.com/user/login
,而文件位于路径https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt
中。
我的脚本是:
import requests
with requests.Session() as c:
data_url = 'https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/'
username = ''
password = ''
login_data = dict(username=username, password=password)
c.post(data_url, data=login_data, headers={'Referer':'https://www.naturalgasintel.com/'})
page = c.get('https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt', stream=True)
print(page.content)
我想使用open
函数来保存页面的实际.txt内容而不是html源,在这里我可以使用诸如以下内容将write
的内容保存到文件中:
localfile = 'output_{}.csv'
datafile = open(localfile, "w", encoding="utf-8")
datafile.write(page)
datafile.close()
如何获取这些内容而不是html源?