Question

此代码用于来自html网页的getying链接，但我想让它只给我带有某些单词的链接。例如，只有在其中包含此词的链接："www.mywebsite.com/word"

我的代码：

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')



for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):

    if link.has_key('href'):
        print link['href']`

Answer 1

您可以使用简单的字符串搜索。以下示例仅打印具有＆＃39; / website-builder＆＃39;的链接。在href。

if '/website-builder' in link['href']:
    print link['href']

完整代码：

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        if '/website-builder' in link['href']:
          print link['href']

示例输出：

/website-builder?linkOrigin=website-builder&linkId=hd.mainnav.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.mywebsite.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.hosting.mywebsite
/website-builder?linkOrigin=website-builder&linkId=ct.btn.stickynavigation.easy-to-use#easy-to-use

Answer 2

以下是我提出的建议：

links = [link for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')) if link.find("word") != -1]
print links

当然，您应该将“word”替换为您希望过滤的任何字词。

Answer 3

完整代码：

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.google.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        if '/website-builder' in link['href']:
          print link['href']

你如何使用beautifulsoup获得某些单词的链接

3 个答案: