我写了这个程序,其目的是访问链接列表中的第18个链接,然后在新页面上再次访问第18个链接。
这个程序按预期工作,但它有点重复和不优雅。
我想知道如果不使用任何功能,你是否有任何关于如何简化它的想法。如果我想重复这个过程10或100次,这将变得非常长。
感谢您的任何建议!
# Note - this code must run in Python 2.x and you must download
# http://www.pythonlearn.com/code/BeautifulSoup.py
# Into the same folder as this program
import urllib
from BeautifulSoup import *
url = raw_input('Enter - ')
if len(url) < 1 :
url='http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = soup('a')
urllist = list()
count = 0
loopcount = 0
for tag in tags:
count = count + 1
tg = tag.get('href', None)
if count == 18:
print count, tg
urllist.append(tg)
url2 = (urllist[0])
html2 = urllib.urlopen(url2).read()
soup2 = BeautifulSoup(html2)
tags2 = soup2('a')
count2 = 0
for tag2 in tags2:
count2 = count2 + 1
tg2 = tag2.get('href', None)
if count2 == 18:
print count2, tg2
urllist.append(tg2)
答案 0 :(得分:2)
这就是你能做的。
import urllib
from BeautifulSoup import *
url_1 = input('') or 'http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'
html_1 = urllib.urlopen(url_1).read()
soup_1 = BeautifulSoup(html_1)
tags = soup('a')
url_retr1 = tags[17].get('href', None)
html_2 = urllib.urlopen(url_retr1).read()
soup_2 = BeautifulSoup(html_2)
tags_2 = soup_2('a')
url_retr1 = tags_2[17].get('href', None)