当我在下面运行我的代码时,第一个url似乎什么也没发生,最后一个print语句是#2
但是当我使用第二个链接运行它时,我能够从网站获取我需要的文档并将其保存为文本文件并继续解析它。我有一堆其他的链接我循环使用它们也都工作
import requests
import os
import logging
logging.basicConfig(filename='example.log', filemode='w', level=logging.NOTSET)
#This one doesn't work
url = "https://www.sec.gov/Archives/edgar/full-index/2017/QTR3/master.idx"
#This one does work
#url = "https://www.sec.gov/Archives/edgar/full-index/2017/QTR4/master.idx"
print "1"
r = requests.get(url)
textfile = open("ugh2.txt", "w")
print "2"
textfile.write(r.text.encode("ascii", "ignore"))
textfile.close()
print "3"
textfile = open("ugh2.txt", "r")
count = 0
linkList = []
print "4"
我已经尝试了所有我知道该怎么做的调试,不知道是什么问题。
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.sec.gov
DEBUG:urllib3.connectionpool:https://www.sec.gov:443 "GET /Archives/edgar/full-index/2017/QTR3/master.idx HTTP/1.1" 200 2730517