Question

当我尝试从网站下载大量网页时遇到此错误。该脚本是从其他几个脚本拼凑起来修改的，似乎我对Python和编程很不熟悉。

Python的版本是3.4.3，请求的版本是2.7.0。

这是剧本：

<c:forEach items="${listaScuoleDS}" var="scuola" varStatus="item">
    <div class="panel-group" id="accordion_${item.index}" role="tablist" aria-multiselectable="true">
        <div class="panel panel-default">
            <div class="panel-heading" role="tab" id="headingOne_${item.index}">
                <h4 class="panel-title">
                    <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseOne" aria-expanded="true" aria-controls="collapseOne" href="#collapseOne_${item.index}"/>'>
                        Collapsible Group Item #1
                    </a>
                </h4>
            </div>

            <div id="collapseOne_${item.index}" class="panel-collapse collapse in" role="tabpanel" aria-labelledby="headingOne">
                <div class="panel-body">
                    Anim pariatur cliche reprehenderit, enim eiusmod high life accusamus terry richardson ad squid. 3 wolf moon officia aute, non cupidatat skateboard dolor brunch. Food truck quinoa nesciunt laborum eiusmod. Brunch 3 wolf moon tempor, sunt aliqua put a bird on it squid single-origin coffee nulla assumenda shoreditch et. Nihil anim keffiyeh helvetica, craft beer labore wes anderson cred nesciunt sapiente ea proident. Ad vegan excepteur butcher vice lomo. Leggings occaecat craft beer farm-to-table, raw denim aesthetic synth nesciunt you probably haven't heard of them accusamus labore sustainable VHS.
                </div>
            </div>
        </div>
    </div>
</c:forEach>

完整的引用是：

import requests
from bs4 import BeautifulSoup
import os.path

s = requests.session()
login_data = {'dest': '/','user': '******', 'pass': '******'}
header_info={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0'}
url='http://www.oxfordreference.com/LOGIN'
s.post(url,data=login_data,headers=header_info)

for i in range(1,100):
    downprefix='http://www.oxfordreference.com/view/10.1093/acref/9780198294818.001.0001/acref-9780198294818-e-'
    downurl=downprefix+str(i)
    r=s.get(downurl,headers=header_info,timeout=30)
    if r.status_code==200:
        soup=BeautifulSoup(r.content,"html.parser")
        shorten=str(soup.find_all("div", class_="entryContent"))
        fname='acref-9780198294818-e-'+str(i)+'.htm'
        newname=os.path.join('shorten',fname)
        htmfile=open(newname,'w',encoding="utf_8")
        htmfile.write(shorten)
        htmfile.close()
        print('Success in '+str(i))
else:
        print('Error in '+str(i))
        errorfile=open('errors.txt','a',encoding="utf_8")
        errorfile.write(str(i))
        errorfile.write('\n')
        errorfile.close()

Answer 1

您正在与之交谈的主持人没有正确回应。当您尝试使用http连接到https服务时，通常会发生这种情况，但也可能存在很多其他情况。

检查正在进行的操作的最佳方法可能是获取网络流量分析器（例如wireshark）并查看连接。

Python：requests.exceptions.ConnectionError :(＆＃39; Connection aborted。＆＃39;，BadStatusLine（＆＃34;＆＃39;＆＃39;＆＃34;，））

1 个答案: