将href的字符串转换为链接列表

时间:2017-03-28 06:49:08

标签: python html python-3.x web-scraping beautifulsoup

我正在尝试使用此代码搜索一些统计信息,包括来自Gosugamers的匹配结果和团队名称:

from bs4 import BeautifulSoup
import requests

for i in range(411):
    try:
        i += 1
        print(i)
        url = 'http://www.gosugamers.net/counterstrike/gosubet?r-page={}'.format(i)
        r = requests.get(url)
        web = BeautifulSoup(r.content,"html.parser")
        table = web.findAll("table", attrs={"class":"simple matches"})
        table = table[1]
        links = table('a')
        for link in links:
            if 'matches' in link.get('href', None):
                if len(link.get('href', None)) != 0:
                    print(link.get('href', None))

    except:
        pass

但当我在单个网页上获得link.get('href', None) 包含所有链接的字符串时,我不知道如何将其转换为列表所有链接,如果有人可以帮助我,我会很高兴,谢谢!

1 个答案:

答案 0 :(得分:1)

对我而言,似乎link.get('href', None)实际上返回了一个链接。 get 方法文档说:

bs4.element.Tag实例的

get(self,key,default = None)方法

Returns the value of the 'key' attribute for the tag, or
the value given for 'default' if it doesn't have that
attribute.

因此,当您获得其中包含“匹配”的链接时,您只需将其添加到列表中即可。

from bs4 import BeautifulSoup
import requests

all_links = []

i = 1
for i in range(411):
    try:
        print(i)
        url = 'http://www.gosugamers.net/counterstrike/gosubet?r-page={}'.format(i)
        r = requests.get(url)
        web = BeautifulSoup(r.content,"html.parser")
        table = web.findAll("table", attrs={"class":"simple matches"})
        table = table[1]
        links = table('a')

        for link in links:
            href = link.get('href')
            if href is not None and 'matches' in href:
                all_links.append(href)

        i += 1
    except:
        pass

print "Here are all the links: ", all_links