Python初学者(没有网络开发人员专门知识)在这里: html页面显示一个人的朋友网络列表(每个名称都有锚标记w。链接到朋友网络列表)。 由于页面上有计时器,因此我编写了一个py代码,通过遍历循环来擦除第n个计数(页面)的第m个位置(朋友):(m-> n-> m-> n .... )。而且有效!
代码:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter URL: ')
position = int(input('Enter position: ')) #Name/link Traverse
count = int(input('Enter count: ')) #Page Traverse
print("Retrieving:", url)
for c in range(count): #returns range of indices
html = urllib.request.urlopen(url, context=ctx).read() #opening URL
soup = BeautifulSoup(html, 'html.parser')
a_tags=soup('a')
link=a_tags[position-1].get('href', None) #url = href(key) value pair
content=a_tags[position-1].contents #name=a_tag.contents
url=link
print("Retrieving:", url)
输入:
Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html
Enter position: 1
Enter count: 10
输出:
Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html
问题:
是否有更好的方法来解决此问题? (库,延迟计时器的解决方法,哈哈)
我的目标是在此处详尽列出所有唯一名称的朋友的“列表”;我不需要任何代码,只是建议和方法就可以。