我试图将表格中的数据与“期限”和“每年百分比”(表4)一起作为URL中的列:
我的代码如下,但我认为我对如何引用第一个日期上方的行和相应的数字感到困惑,因此在行AttributeError: 'NoneType' object has no attribute 'getText'
中得到错误row_name = row.findNext('td.header_units').getText()
from bs4 import BeautifulSoup
import urllib2
url = "http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y"
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)
desired_table = soup.findAll('table')[4]
# Find the columns you want data from
headers1 = desired_table.findAll('td.header_units')
headers2 = desired_table.findAll('td.header')
desired_columns = []
for th in headers1: #I'm just working with `headers1` currently to see if I have the right idea
desired_columns.append([headers1.index(th), th.getText()])
# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')
for row in rows[1:]:
cells = row.findAll('td')
row_name = row.findNext('td.header_units').getText()
for column in desired_columns:
print(cells[column[0]].text.encode('ascii', 'ignore'), row_name.encode('ascii', 'ignore'), column[1].encode('ascii', 'ignore'))
谢谢
答案 0 :(得分:1)
这会将所有元素都放在元组中:
from bs4 import BeautifulSoup
import requests
r = requests.get(
"http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y")
soup = BeautifulSoup(r.content)
data = iter(soup.find("table", {"class": "tablestats"}).find("td", {"class": "header"}).find_all_next("tr"))
headers = (next(data).text, next(data).text)
table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]
for a, b in table_items:
print(u"Period={}, Percent per annum={}".format(a, b if b.strip() else "null"))
输出:
Period=2015-06-09, Percent per annum=1.842026
Period=2015-06-08, Percent per annum=1.741636
Period=2015-06-07, Percent per annum=null
Period=2015-06-06, Percent per annum=null
Period=2015-06-05, Percent per annum=1.700042
Period=2015-06-04, Percent per annum=1.667431