好的,我正在制作一个程序,以测试网站中的某些页面是离线还是在线。
import urllib2
u = 'http://www.google.com/'
pages = open('pages.txt', 'r').readlines()
for page in pages:
url = u + page
try:
req = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.code == 404:
print url + " does not exists"
else:
print url + " exists"
和" Pages.txt"包含这样的东西:
search
page
plus
signin
account
security
lol
about
contactus
someotherpage.html
现在程序运行正常,但我希望它将可用页面存储在txt文件中。有人可以帮助我吗?如果不仅列出存在的页面并忘记离线页面也会很棒。谢谢:))
答案 0 :(得分:0)
怎么样:
python your_script> Pages.txt
修改强>
用于写入文件
with open('Pages.txt', 'w') as f:
f.write('something')
f.close()
答案 1 :(得分:0)
只需按照您正在阅读的方式写入文件:
out = open('pages.txt', 'w')
...然后在你已经写过的else:
标签中:
out.write(url+"\n")
制作:
import urllib2
u = 'http://www.google.com/'
pages = open('pages.txt', 'r').readlines()
out = open('pages.txt', 'w')
for page in pages:
url = u+page
try:
req = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.code == 404:
print url+" does not exists"
else:
print url+" exists"
out.write(url+"\n")
答案 2 :(得分:0)
以追加模式打开文件进行写入。 重定向print语句以打印到新文件处理程序。
import urllib2
u = raw_input('Enter a url: ') or 'http://www.google.com/'
pages = open('pages.txt', 'r').readlines()
with open('available.txt', 'a') as available:
for page in pages:
url = u.rstrip('\n')+page
try:
req = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.code == 404:
print url+" does not exists"
else:
print url+" exists"
print >> available, url.rstrip('\n')
输出:
(availablepages)macbook:availablepages joeyoung$ ls -al
total 16
drwxr-xr-x 4 joeyoung staff 136 Sep 7 00:23 .
drwxr-xr-x 4 joeyoung staff 136 Sep 6 23:54 ..
-rw-r--r-- 1 joeyoung staff 478 Sep 7 00:20 availablepages.py
-rw-r--r-- 1 joeyoung staff 70 Sep 6 23:56 pages.txt
(availablepages)macbook:availablepages joeyoung$ python availablepages.py
Enter a url: http://www.google.com/
http://www.google.com/search
exists
http://www.google.com/page
does not exists
http://www.google.com/plus
exists
http://www.google.com/signin
does not exists
http://www.google.com/account
exists
http://www.google.com/security
exists
http://www.google.com/lol
does not exists
http://www.google.com/about
exists
http://www.google.com/someotherpage.html
does not exists
(availablepages)macbook:availablepages joeyoung$ ls -al
total 24
drwxr-xr-x 5 joeyoung staff 170 Sep 7 00:23 .
drwxr-xr-x 4 joeyoung staff 136 Sep 6 23:54 ..
-rw-r--r-- 1 joeyoung staff 145 Sep 7 00:23 available.txt
-rw-r--r-- 1 joeyoung staff 478 Sep 7 00:20 availablepages.py
-rw-r--r-- 1 joeyoung staff 70 Sep 6 23:56 pages.txt
(availablepages)macbook:availablepages joeyoung$ cat available.txt
http://www.google.com/search
http://www.google.com/plus
http://www.google.com/account
http://www.google.com/security
http://www.google.com/about
(availablepages)macbook:availablepages joeyoung$ python availablepages.py
Enter a url: http://www.bing.com/
http://www.bing.com/search
exists
http://www.bing.com/page
does not exists
http://www.bing.com/plus
does not exists
http://www.bing.com/signin
does not exists
http://www.bing.com/account
exists
http://www.bing.com/security
does not exists
http://www.bing.com/lol
does not exists
http://www.bing.com/about
does not exists
http://www.bing.com/someotherpage.html
does not exists
(availablepages)macbook:availablepages joeyoung$ ls -al
total 24
drwxr-xr-x 5 joeyoung staff 170 Sep 7 00:23 .
drwxr-xr-x 4 joeyoung staff 136 Sep 6 23:54 ..
-rw-r--r-- 1 joeyoung staff 200 Sep 7 00:24 available.txt
-rw-r--r-- 1 joeyoung staff 478 Sep 7 00:20 availablepages.py
-rw-r--r-- 1 joeyoung staff 70 Sep 6 23:56 pages.txt
(availablepages)macbook:availablepages joeyoung$ cat available.txt
http://www.google.com/search
http://www.google.com/plus
http://www.google.com/account
http://www.google.com/security
http://www.google.com/about
http://www.bing.com/search
http://www.bing.com/account