我是一个python初学者,并编写了一个代码来下载指定网址中的所有链接。有没有更好的方法来做到这一点,以下代码是否正确?
#!/usr/bin/python3
import re
import requests
def get_page(url):
r = requests.get(url)
print(r.status_code)
content = r.text
return content
if __name__ =="__main__":
url = 'http://developer.android.com'
content = get_page(url)
content_pattern = re.compile('<a href=(.*?)>.*?</a>')
result = re.findall(content_pattern, content)
for link in result:
with open('download.txt', 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
答案 0 :(得分:2)
试试这个:
from bs4 import BeautifulSoup
import sys
import requests
def get_links(url):
r = requests.get(url)
contents = r.content
soup = BeautifulSoup(contents)
links = []
for link in soup.findAll('a'):
try:
links.append(link['href'])
except KeyError:
pass
return links
if __name__ == "__main__":
url = sys.argv[1]
print get_links(url)
sys.exit()
答案 1 :(得分:1)
您可能想要调查linux wget
命令,它可以执行您想要的操作。如果你真的想要一个python解决方案,那么mechanize和beautiful soup可以分别执行HTTP请求和解析HTML。