import re, urllib.request
patern = re.compile(r'image/\w*\W*\w*\.\jpg', re.I|re.M)
file = open('APODLinks.txt','r')
rf = file.read()
a = rf.split('\n')
file.close()
def lic(li):
if not li:
pass
else:
print(li[0])
f.write('http://apod.nasa.gov/apod/%s\n' % li[0])
def main():
for i in range(len(a)):
ur = urllib.request.urlopen(a[i])
mf = re.findall(patern, str(ur.read()))
lic(mf)
f = open('APODImgs.txt','w')
main()
f.close()
我的代码出了什么问题我尝试用所有的jpg图片写一个txt文件 从当天的天文图片,但文件APODImgs.txt是空的... 有时候mf列表是空的也许这是我的问题......
APODLinks.txt包含这样的网址:
apod.nasa.gov/apod/ap140815.html
apod.nasa.gov/apod/ap140814.html
apod.nasa.gov/apod/ap140813.html
7000行网址
APODImgs.txt必须如下:
apod.nasa.gov/apod/image/1408/Persei93_1abolfath.jpg
apod.nasa.gov/apod/image/1408/Supermoon_20140810.JPG
apod.nasa.gov/apod/image/1408/m57_nasagendler_3000.jpg
apod.nasa.gov/apod/image/1408/HebesChasma_esa_1024.jpg
...
请帮助并抱歉我的英文...
答案 0 :(得分:1)
not li
中的lic
很可能始终为真,因为正则表达式不匹配。
要想出来,请打印HTTP响应正文:
urr = urllib.request.urlopen(a[i]).read()
print repr(urr)
mf = re.findall(patern, urr)
print repr(mf)
lic(mf)
答案 1 :(得分:0)
我更改了我的代码并且有效!!!
import re, urllib.request
patern = re.compile(r'image/\w*\W*\w*\.jpg', re.I|re.M)
file = open('APODLinks.txt','r')
rf = file.read()
a = rf.split('\n')
file.close()
def lic(li):
if not li:
print("No matches found")
else:
print('http://apod.nasa.gov/apod/%s' % li[0])
f.write('http://apod.nasa.gov/apod/%s\n' % li[0])
def main():
for i in range(len(a)):
try:
ur = urllib.request.urlopen(a[i])
except:
print('404 not found!')
mf = re.findall(patern, str(ur.read()))
lic(mf)
f = open('APODImgs.txt','w')
main()
f.close()