python3-从URL下载pdf文件

时间:2020-07-15 09:18:20

标签: html python-3.x pdf python-requests

我的python3代码:

chosen_model = SVC(gamma='auto')
chosen_model.fit(X_train,Y_train)
predictions = chosen_model.predict(X_valid)

它将内容保存在metadat.pdf中,但这不是pdf的真实内容,它是以下html页面:

import requests

url = sys.argv[1]
r = requests.get(url, stream=True)
chunk_size = 20000
with open('metadata.pdf', 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
        fd.write(chunk)

任何帮助,我如何保存文件的实际内容,而不是此html? 它应该是真正的pdf,当我下载它时,它就是这个html页面

更新:

当我使用python会话时,

来自服务器的aNSWER:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<html>
<!-- $HTMLid:   index.html /main/6 11-Jun-2004.13:54:09 $ -->
<head>
<title>Allied Waste</title>

<script language="JavaScript">
<!--
if (top != self) {
        top.location = self.location;
    }
function doRedirect() {
  document.login.submit();
} 

function init () {
    var initChar = /^\?/;
    var list = top.location.search.replace(initChar,"");
    var parms = list.split('&');
    for ( ct=0; ct < parms.length; ct++ ) {
        vals = parms[ct].split('=');
        switch ( vals[0] ) {
            case "unitCode":
                document.login.unitCode.value = unescape(vals[1]);
                if ( document.login.unitCode.value == 'undefined' || document.login.unitCode.value == '' )
                    document.login.unitCode.value = "ALW";
                break;
      default:
        document.login.unitCode.value = "ALW";
                break;
        }
    }
    document.login.submit();
}
//-->
</script>
</head>
<body onload="init()">
  <form name="login" action="inetSrv" method="post">
    <input type="hidden" name="type" value="SignonService"/>
    <input type="hidden" name="action" value="SignonPrompt"/>
    <input type="hidden" name="client" value="701122300"/>
    <input type="hidden" name="unitCode" value=""/>
  </form>
</body>
</html>

1 个答案:

答案 0 :(得分:0)

该页面似乎是到登录页面的重定向。如果可以的话,手动进行操作可能会更简单。

否则,您将必须处理登录过程才能检索(可能)它给您的身份验证cookie,然后必须将其发送到aux3 <- apply(aux1, 1, function(x) rowSums(t(x*t(aux2)))) colnames(aux3) <- paste0("w_", 1:ncol(aux3)) df1 %>% select(season, round, team, margin) %>% cbind(aux3) -> aux3 aux3 # season round team margin w_1 w_2 w_3 w_4 w_5 w_6 # 1 2019 1 Team A 33 1.86750 1.20750 1.5950 1.6045 1.94650 1.5080 # 2 2019 1 Team B 56 1.22925 0.71725 0.8725 0.9950 0.70425 0.9250 # 3 2019 1 Team C 63 1.13900 0.90300 0.8920 1.5715 1.15650 0.9495 # 4 2019 1 Team D 50 1.03750 0.63750 0.7250 0.9500 0.60000 0.7875 # w_7 w_8 w_9 # 1 0.98350 2.1765 2.0115 # 2 0.49275 1.0095 0.8815 # 3 0.70100 1.5435 1.4845 # 4 0.43750 0.9000 0.8000 请求中,以使预期的pdf可用。 / p>