使用Beautiful Soup并将我的网络源数据隔离在'p'标签内,我设法检索到我需要的数据。现在,我想迭代变量'table'中的剩余数据(在每行和每个单元格上)以将数据刮到列表中。谁能帮助我如何实现这一目标?我已经阅读了其他几篇文章,但无法将其应用于我的具体问题......谢谢。
from bs4 import BeautifulSoup
import urllib2
url = "http://www.gks.ru/bgd/free/B00_25/IssWWW.exe/Stg/d000/000715.HTM"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), 'html.parser')
table=soup.findAll('p',text=True)
print(table)
答案 0 :(得分:2)
假设您想获得每月价格数据,您需要找到tr
内的所有table
元素并跳过前3个(标题行)。请注意,html.parser
对我不起作用,但lxml
做了(见Differences between parsers):
soup = BeautifulSoup(page, 'lxml') # requires 'lxml' to be installed
table = soup.find("center").find("table")
for row in table.find_all("tr")[3:]:
cells = [cell.get_text(strip=True) for cell in row.find_all("td")]
print(cells)
打印:
['January', '469,4', '15,0', '3,9']
['February', '479,8', '16,7', '2,2']
['March', '485,6', '16,9', '1,2']
['April', '487,8', '16,4', '0,5']
['May', '489,5', '15,8', '0,4']
['June', '490,5', '15,3', '0,2']
['July', '494,4', '15,6', '0,8']
['August', '496,1', '15,8', '0,4']
['September', '499,0', '15,7', '0,6']
['October', '502,7', '15,6', '0,7']
['November', '506,4', '15,0', '0,8']
['December', '', '', '']