我只是使用下面的代码来解析URL中的链接。找到链接,但是我的计数器不起作用。关于如何修理我的柜台有什么想法吗? 谢谢
def parse_all_links(html):
links = re.findall(r"""a href=(['"].*['"])""", html)#find links starting with href
print("found the following links addresses: ".format(len(html)))#print a message before the output
if len(links) ==0:
print("Sorry, no links found")
else:
count = 1#this count how many links are displayed
for e in links:
print(e)
count += 1
print('--------------')
答案 0 :(得分:1)
您可能要使用len()函数来获取链接列表的长度,并使用专用的解析库(例如Beautiful Soup)来解析HTML,因为它可以处理格式错误或格式错误的HTML,例如冠军。
getFormation(){
var id = this.route.snapshot.params['id'];
if(id){
this.formationService.getFormation(id)
.subscribe(formation=>{
this.formation = formation;
})
}
}
答案 1 :(得分:1)
我不完全理解您的问题,但是您的代码存在一些小问题。因此,请告诉我这是否有帮助:
import re
import requests
def parse_all_links(html):
links = re.findall(r"""a href=(['"].*['"])""", html) # find links starting with href
print("found the following links addresses: ".format(len(html))) # print a message before the output
if len(links) == 0:
print("Sorry, no links found")
else:
count = 0 # this count how many links are displayed
for e in links:
print(e)
count += 1
print('--------------\nCount:{}'.format(count))
parse_all_links(requests.get("http://www.onet.pl").text)
我测试了解决方案,它可以工作。样本输出:
...
"https://zapytaj.onet.pl/Zadania/testy/index.html"
"https://zapytaj.onet.pl/quizy/index.html"
"https://zapytaj.onet.pl/Category/005/1,Biznes_i_Finanse.html"
"https://zapytaj.onet.pl/Category/029/1,Gry.html"
"https://zapytaj.onet.pl/Category/028/1,Hobby.html"
"https://zapytaj.onet.pl/Category/021/1,Dla_Doroslych.html"
"https://zapytaj.onet.pl/Category/009/1,Dom_i_Ogrod.html"
"https://zapytaj.onet.pl/Category/016/1,Jedzenie_i_Napoje.html"
"http://zapytaj.onet.pl"
"https://polityka-prywatnosci.onet.pl/"
"http://reklama.onet.pl/"
"http://ofirmie.onet.pl/0,0,0,PL,aktualne_ogloszenia,oferta.html"
"http://onettechnologie.pl/"
"http://www.dreamlab.pl/"
--------------
Count:319