我过去总是通过您庞大的社区来解决我的基本问题,现在我正面临一个问题。 我被分配给与Andrew's request非常相似的任务,即从手动下载到自动下载,在该任务中,我必须编写脚本以通过提供认证从EUMETSAT下载数据。请在下面尝试一下。
import requests
from lxml import html
# EUMETSAT url for authentification
url_EUMETSAT = 'http://oiswww.eumetsat.org/SDDI/webapps/publicdcp/logon.jsp'
username = '<USER>'
password = '<PASS>'
# Authentification attempt
EUMETSAT_request = requests.Session()
EUMETSAT_result = EUMETSAT_request.get(url_EUMETSAT)
EUMETSAT_login = {
"username" : username,
"password" : password
}
CONNEXION_result = EUMETSAT_request.post(url_EUMETSAT, data = EUMETSAT_login)
CONNEXION_result.status_code
# 200 means that request has been established
# Download of one data file
url_EUMETSAT_DATABASE ='http://oiswww.eumetsat.org/SDDI/webapps/publicdcp/mainMenuAction.do?action=DCP_ADMIN'
DATA_BASE = EUMETSAT_request.get(url_EUMETSAT_DATABASE)
url_file1 ='http://oiswww.eumetsat.org/SDDI/webapps/publicdcp/dcpAdmin.do?action=ACTION_DOWNLOAD&id=1212D0C2'
DATA_FILE1 = EUMETSAT_request.get(url_file1, headers = dict(referer = url_file1))
# Writing of the content in data.txt
filename = 'data.txt'
data = DATA_FILE1.content
with open(filename,'wb') as open_file:
open_file.write(data)
通过这个脚本,我希望在data.txt
中拥有我的数据,就像我手动下载它一样。但是,相反,当我打开它时,我有一个带有下面标头的html代码。
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<!-- MUST include prior other includes-->
<script language="javascript" type="text/javascript">
/**
* Environment parameters (MUST BE DEFINED!)
*/
var EUM_SNIPPETS_CFG = new Array();
/*Commonly changed params*/
EUM_SNIPPETS_CFG['titleHigh'] = "Title (not yet customized)";
EUM_SNIPPETS_CFG['titleSub'] = "Tagline (not yet customized)";
EUM_SNIPPETS_CFG['displaySearch'] = true;
EUM_SNIPPETS_CFG['useLocalAssetsPath'] = false;//for isolated assets only
EUM_SNIPPETS_CFG['searchOpensNewWindow'] = false;
EUM_SNIPPETS_CFG['pathWebsite'] = "http://www.eumetsat.int/";
EUM_SNIPPETS_CFG['pathSearch'] = "http://search.eumetsat.int/search";
EUM_SNIPPETS_CFG['externalAssetsDomain'] = "http://dev75.eumetsat.int";//no slash at the end
//for localized version only. If absolute path set, this will be overridden to applicable CMS urls
EUM_SNIPPETS_CFG['path_images'] = "images";//path to image assets
EUM_SNIPPETS_CFG['path_css'] = "css";//path to CSS assets
EUM_SNIPPETS_CFG['path_javascript'] = "javascript";//path to JS assets
</script>
<script language="JavaScript" type="text/javascript">
/**
第二,当我手动下载文件时,文件名遵循带有站名,日期,小时或其他名称的模式。从变量url_file1
中可以看到,文件名不包含在内。
你能给我强调一下吗?我认为我有些遗漏之处。