这是通过传递网站的URL提取网站的所有href链接的代码。
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://kteq.in/services")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
if link.get('href')==None:
continue
result = re.sub(r"http\S+", "", link.get('href'))
print result
当我运行上面的代码时,将提取该网站的href链接。我得到以下输出。
index
index
#
solutions#internet-of-things
solutions#online-billing-and-payment-solutions
solutions#customer-relationship-management
solutions#enterprise-mobility
solutions#enterprise-content-management
solutions#artificial-intelligence
solutions#b2b-and-b2c-web-portals
solutions#robotics
solutions#augement-reality-virtual-reality
solutions#azure
solutions#omnichannel-commerce
solutions#document-management
solutions#enterprise-extranets-and-intranets
solutions#business-intelligence
solutions#enterprise-resource-planning
services
clients
contact
#
#
#
#
#
#
#
#contactform
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
index
services
#
contact
#
iOSDevelopmentServices
AndroidAppDevelopment
WindowsAppDevelopment
HybridSoftwareSolutions
CloudServices
HTML5Development
iPadAppDevelopment
services
services
services
services
services
services
contact
contact
contact
contact
contact
#
#
#
#
现在,我必须从这些href链接中提取CSS。例如,我必须从我在输出中获得的'index'href链接中提取CSS。请建议我。
答案 0 :(得分:0)
您可以循环浏览已收集的所有href链接,并在这些页面中获取css链接。
ListViewItemComparer
通过索引页面,我得到了以下CSS链接
输出量bootstrap / bootstrap.min.css
https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css
https://cdn.linearicons.com/free/1.0.0/icon-font.min.css
//fonts.googleapis.com/css
cards / card.css
GalleryStyle / set1.css
css / custom.css
page-transition / css / component.css
page-transition / css / animations.css
https://cdnjs.cloudflare.com/ajax/libs/normalize/5.0.0/normalize.min.css
https://cdnjs.cloudflare.com/ajax/libs/slick-
转盘/1.5.5/slick.min.css
css / scrollpage.css
css / changingtext.css
css / color-slider.css