如何使用python使用beautifulsoup检索文本

时间:2015-07-28 10:01:18

标签: python selenium beautifulsoup

我希望使用beautifulsoup从HTML获取使用repairsonwheelsrim-hub.com的文本,请告诉我应该如何做到这一点。目前我正在使用

1-12

1 个答案:

答案 0 :(得分:1)

这就是你想要的:

from bs4 import BeautifulSoup
text='<div class="biz-website"> <span class="offscreen">Business website</span> <a target="_blank" href="/biz_redir?url=http%3A%2F%2Frepairsonwheelsrim-hub.com&src_bizid=8tY2YtXPk1rGO7sl43LH8A&cachebuster=1438073532&s=6b75d47d32b28eb8e50506859857b75e949d698cdbc47e9892cc2a3b43e480c2">repairsonwheelsrim-hub.com</a> </div>'
soup = BeautifulSoup(text, 'html.parser')    
print soup.a.text

<强>输出:

repairsonwheelsrim-hub.com

循环遍历网址的文字:

from bs4 import BeautifulSoup
text='<div class="biz-website"> <span class="offscreen">Business website</span> <a target="_blank" href="/biz_redir?url=http%3A%2F%2Frepairsonwheelsrim-hub.com&src_bizid=8tY2YtXPk1rGO7sl43LH8A&cachebuster=1438073532&s=6b75d47d32b28eb8e50506859857b75e949d698cdbc47e9892cc2a3b43e480c2">repairsonwheelsrim-hub.com</a> </div>'    
soup = BeautifulSoup(text, 'html.parser')   
for t in soup.findAll("a"):
    print t.text

For more on BS4 see their official site

修改

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
a=requests.get("http://www.yelp.com/biz/scotts-pizza-tours-new-york")
text=a.content

soup = BeautifulSoup(text, 'html.parser')   
for t in soup.findAll(lambda tag: tag.name == 'a' and 'target' in tag.attrs):
    if "".join(t["target"]) in "_blank":
        print t.get_text()

<强>输出:

scottspizzatours.com
scottspizzatours.com
scottspizzatours.com/pri…