我需要从以下html代码中提取除<p><a href><rel>
等之外的完整文本。
<p>Many of the features that made the Samsung Galaxy S4 one of the most anticipated phones in recent history -- such as its 5-inch 1920 x 1080 <a href="http://www.bubblews.com/news/421662-samsung-galaxy-s4-worlds-first-full-hd-super-amoled-display" rel="nofollow" target="_blank">Full HD Super AMOLED display</a>, its powerful processors (<a href="http://www.samsung.com/global/business/semiconductor/minisite/Exynos/blog_Spotlight_on_the_Exynos5Octa.html" rel="nofollow" target="_blank">Samsung Exynos 5 Octa</a> in the international version and <a href="http://www.qualcomm.com/snapdragon/blog/topics/snapdragon 600" rel="nofollow" target="_blank">Qualcomm Snapdragon 600</a> in the U.S. version) and 16GB, 32GB and 64GB storage options -- are now bringing grief to those who rushed to purchase the fourth-generation Galaxy S series smartphone upon its late April release.</p>
我尝试过以下代码
from bs4 import BeautifulSoup
from urllib2 import urlopen
BASE_URL = "http://www.chicagoreader.com"
def get_category_links(section_url):
html = urlopen(section_url).read()
soup = BeautifulSoup(html, "lxml")
for div in soup.findall("div", attrs={'class':'field-content'}):
print div.find("p").content[0]
但是提供以下输出
许多功能使三星Galaxy S4成为近期历史上最受期待的手机之一 - 例如其5英寸1920 x 1080
我无法获得完整的文字,它应该在href和rel等标签后给出文字,请建议我如何获得以下输出。
许多功能使三星Galaxy S4成为近期历史上最受期待的手机之一 - 例如其5英寸1920 x 1080全高清Super AMOLED显示屏其强大的处理器.Samsung Exynos 5 Octa国际版和美国版本的高通Snapdragon 600以及16GB,32GB和64GB存储选项 - 现在为那些在4月底发布时急于购买第四代Galaxy S系列智能手机的人带来了悲痛。 / p>
谢谢..
答案 0 :(得分:3)
您可以使用.text
:
>>> from bs4 import BeautifulSoup
>>> html = '<p>Many of the features that made the Samsung Galaxy S4 one of the most anticipated phones in recent history -- such as its 5-inch 1920 x 1080 <a href="http://www.bubblews.com/news/421662-samsung-galaxy-s4-worlds-first-full-hd-super-amoled-display" rel="nofollow" target="_blank">Full HD Super AMOLED display</a>, its powerful processors (<a href="http://www.samsung.com/global/business/semiconductor/minisite/Exynos/blog_Spotlight_on_the_Exynos5Octa.html" rel="nofollow" target="_blank">Samsung Exynos 5 Octa</a> in the international version and <a href="http://www.qualcomm.com/snapdragon/blog/topics/snapdragon 600" rel="nofollow" target="_blank">Qualcomm Snapdragon 600</a> in the U.S. version) and 16GB, 32GB and 64GB storage options -- are now bringing grief to those who rushed to purchase the fourth-generation Galaxy S series smartphone upon its late April release.</p>'
>>> soup = BeautifulSoup(html)
>>> print soup.p.text
Many of the features that made the Samsung Galaxy S4 one of the most anticipated phones in recent history -- such as its 5-inch 1920 x 1080 Full HD Super AMOLED display, its powerful processors (Samsung Exynos 5 Octa in the international version and Qualcomm Snapdragon 600 in the U.S. version) and 16GB, 32GB and 64GB storage options -- are now bringing grief to those who rushed to purchase the fourth-generation Galaxy S series smartphone upon its late April release.