Question

我正在为我的食品科学课程开展一个项目，这个项目要求我做研究，但为什么这样做可以为你做点什么呢？无论如何，我正在使用python 2.7和BeautifulSoup与urllib2，并需要帮助弄清楚如何只打印标签之间的内容而不是标签本身，以便我可以复制和过去它抓取我的谷歌文档。这是我的代码我正在使用任何帮助非常感谢谢谢！

import BeautifulSoup, urllib2, time
from BeautifulSoup import *

print("BELLY-FAT-CURE")
url = urllib2.urlopen("http://www.webmd.com/diet/belly-fat-diet")

content = url.read()

soup = BeautifulSoup(content)
headers = soup.findAll("h3")
texts = soup.findAll("p")

print(headers)
print(texts)
time.sleep(5)

print("CABBAGE SOUP DIET INFO")
url = urllib2.urlopen("http://www.webmd.com/diet/cabbage-soup-diet")
content1 = url.read()

soup1 = BeautifulSoup(content1)
headers1 = soup.findAll("h3")
texts1 = soup.findAll("p")
print(headers1)
print(texts1)

Answer 1

获取每个元素的text属性的值：

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("http://www.webmd.com/diet/belly-fat-diet"))

print([header.text for header in soup.find_all("h3")])
print([p.text for p in soup.find_all("p")])

打印：

[u'The Promise', u'Does It Work?', ... ]
[u'Common Conditions', u'Featured Topics', ... ]

请注意，在示例中，我使用的是BeautifulSoup4，这也是您应该使用的版本 - 不再开发和维护第三个版本。

仅在标签之间打印内容

1 个答案: