你如何使用beautifulsoup获得某些单词的链接

时间:2015-03-11 20:12:57

标签: python url web beautifulsoup httplib2

此代码用于来自html网页的getying链接,但我想让它只给我带有某些单词的链接。 例如,只有在其中包含此词的链接:"www.mywebsite.com/word"

我的代码:

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')



for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):

    if link.has_key('href'):
        print link['href']`

3 个答案:

答案 0 :(得分:2)

您可以使用简单的字符串搜索。以下示例仅打印具有' / website-builder'的链接。在href。

if '/website-builder' in link['href']:
    print link['href']

完整代码:

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        if '/website-builder' in link['href']:
          print link['href']

示例输出:

/website-builder?linkOrigin=website-builder&linkId=hd.mainnav.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.mywebsite.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.hosting.mywebsite
/website-builder?linkOrigin=website-builder&linkId=ct.btn.stickynavigation.easy-to-use#easy-to-use

答案 1 :(得分:0)

以下是我提出的建议:

links = [link for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')) if link.find("word") != -1]
print links

当然,您应该将“word”替换为您希望过滤的任何字词。

答案 2 :(得分:0)

完整代码:

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.google.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        if '/website-builder' in link['href']:
          print link['href']