Python解析网页链接计数器

时间:2018-08-01 10:20:14

标签: python parsing url hyperlink counter

我只是使用下面的代码来解析URL中的链接。找到链接,但是我的计数器不起作用。关于如何修理我的柜台有什么想法吗? 谢谢

def parse_all_links(html):

links =  re.findall(r"""a href=(['"].*['"])""", html)#find links starting with href
print("found the following links addresses: ".format(len(html)))#print a message before the output

if len(links) ==0:
    print("Sorry, no links found")
else:
    count = 1#this count how many links are displayed
    for e in links:
        print(e)
        count += 1

print('--------------')

2 个答案:

答案 0 :(得分:1)

您可能要使用len()函数来获取链接列表的长度,并使用专用的解析库(例如Beautiful Soup)来解析HTML,因为它可以处理格式错误或格式错误的HTML,例如冠军。

  getFormation(){
    var id = this.route.snapshot.params['id'];
    if(id){
        this.formationService.getFormation(id)
            .subscribe(formation=>{
          this.formation = formation;
        })
    }

  }

答案 1 :(得分:1)

我不完全理解您的问题,但是您的代码存在一些小问题。因此,请告诉我这是否有帮助:

import re
import requests
def parse_all_links(html):
    links = re.findall(r"""a href=(['"].*['"])""", html)  # find links starting with href
    print("found the following links addresses: ".format(len(html)))  # print a message before the output

    if len(links) == 0:
        print("Sorry, no links found")
    else:
        count = 0  # this count how many links are displayed
        for e in links:
            print(e)
            count += 1

    print('--------------\nCount:{}'.format(count))


parse_all_links(requests.get("http://www.onet.pl").text)

我测试了解决方案,它可以工作。样本输出:

...
"https://zapytaj.onet.pl/Zadania/testy/index.html"
"https://zapytaj.onet.pl/quizy/index.html"
"https://zapytaj.onet.pl/Category/005/1,Biznes_i_Finanse.html"
"https://zapytaj.onet.pl/Category/029/1,Gry.html"
"https://zapytaj.onet.pl/Category/028/1,Hobby.html"
"https://zapytaj.onet.pl/Category/021/1,Dla_Doroslych.html"
"https://zapytaj.onet.pl/Category/009/1,Dom_i_Ogrod.html"
"https://zapytaj.onet.pl/Category/016/1,Jedzenie_i_Napoje.html"
"http://zapytaj.onet.pl"
"https://polityka-prywatnosci.onet.pl/"
"http://reklama.onet.pl/"
"http://ofirmie.onet.pl/0,0,0,PL,aktualne_ogloszenia,oferta.html"
"http://onettechnologie.pl/"
"http://www.dreamlab.pl/"
--------------
Count:319