<output></output>
这是我仍然留下的。
table = soup.findAll('div', attrs={"class":"five columns"})
for data in table:
para = data.findAll('p')
print para
答案 0 :(得分:0)
您可以尝试使用BeautifulSoup对象.text
的{{1}}属性。我进一步用para.text
函数拆分了密钥对值,如果你不想拆分,那么只需re.split()
para.text
输出:
from bs4 import BeautifulSoup
import re
a = """<p><span class="four">Location: </span> <span id="wt-loc" title="New Delhi / Safdarjung">New Delhi / Safdarjung</span></p>, <p><span class="four">Current Time: </span> <span id="wtct">Feb 12, 2017 at 10:29:52 am</span></p>, <p><span class="four">Latest Report: </span> Feb 12, 2017 at 8:30 am</p>, <p><span class="four">Visibility: </span> 1 km</p>, <p><span class="four">Pressure: </span> 102.12 kPa</p>, <p><span class="four">Humidity: </span> 95%</p>, <p><span class="four">Dew Point: </span> 10 °C</p>"""
soup = BeautifulSoup(a, 'html.parser')
re.split(r', (?=\s*[A-Z])', soup.text)
答案 1 :(得分:0)
使用.text
获取p标记下的所有文字,您需要做的是迭代findAll(p)
from bs4 import BeautifulSoup
html = '''<p><span class="four">Location: </span> <span id="wt-loc" title="New Delhi / Safdarjung">New Delhi / Safdarjung</span></p>, <p><span class="four">Current Time: </span> <span id="wtct">Feb 12, 2017 at 10:29:52 am</span></p>, <p><span class="four">Latest Report: </span> Feb 12, 2017 at 8:30 am</p>, <p><span class="four">Visibility: </span> 1 km</p>, <p><span class="four">Pressure: </span> 102.12 kPa</p>, <p><span class="four">Humidity: </span> 95%</p>, <p><span class="four">Dew Point: </span> 10 °C</p>'''
soup = BeautifulSoup(html, 'lxml')
for p in soup.find_all('p'):
print(p.text)
出:
Location: New Delhi / Safdarjung
Current Time: Feb 12, 2017 at 10:29:52 am
Latest Report: Feb 12, 2017 at 8:30 am
Visibility: 1 km
Pressure: 102.12 kPa
Humidity: 95%
Dew Point: 10 °C
答案 2 :(得分:0)
美丽的汤有一个名为get_text()的功能,它允许您获取标签内的所有文本而忽略其他标签。只需致电p.get_text()
即可。如果您还想删除空白区域p.get_text(strip=True)
。