此代码用于来自html网页的getying链接,但我想让它只给我带有某些单词的链接。
例如,只有在其中包含此词的链接:"www.mywebsite.com/word"
我的代码:
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
print link['href']`
答案 0 :(得分:2)
您可以使用简单的字符串搜索。以下示例仅打印具有' / website-builder'的链接。在href。
if '/website-builder' in link['href']:
print link['href']
完整代码:
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/website-builder' in link['href']:
print link['href']
示例输出:
/website-builder?linkOrigin=website-builder&linkId=hd.mainnav.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.mywebsite.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.hosting.mywebsite
/website-builder?linkOrigin=website-builder&linkId=ct.btn.stickynavigation.easy-to-use#easy-to-use
答案 1 :(得分:0)
以下是我提出的建议:
links = [link for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')) if link.find("word") != -1]
print links
当然,您应该将“word”替换为您希望过滤的任何字词。
答案 2 :(得分:0)
完整代码:
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.google.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/website-builder' in link['href']:
print link['href']