Question

我试图将表格中的数据与“期限”和“每年百分比”（表4）一起作为URL中的列：

我的代码如下，但我认为我对如何引用第一个日期上方的行和相应的数字感到困惑，因此在行AttributeError: 'NoneType' object has no attribute 'getText'中得到错误row_name = row.findNext('td.header_units').getText()

from bs4 import BeautifulSoup
import urllib2 

url = "http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y"

content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)

desired_table = soup.findAll('table')[4]

# Find the columns you want data from
headers1 = desired_table.findAll('td.header_units')
headers2 = desired_table.findAll('td.header')
desired_columns = []
for th in headers1: #I'm just working with `headers1` currently to see if I have the right idea
    desired_columns.append([headers1.index(th), th.getText()])

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    cells = row.findAll('td')
    row_name = row.findNext('td.header_units').getText()
    for column in desired_columns:
        print(cells[column[0]].text.encode('ascii', 'ignore'), row_name.encode('ascii', 'ignore'), column[1].encode('ascii', 'ignore'))

谢谢

Answer 1

这会将所有元素都放在元组中：

from bs4 import BeautifulSoup
import requests

r = requests.get(
    "http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y")
soup = BeautifulSoup(r.content)

data = iter(soup.find("table", {"class": "tablestats"}).find("td", {"class": "header"}).find_all_next("tr"))


headers = (next(data).text, next(data).text)
table_items =  [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]

for a, b in table_items:
    print(u"Period={}, Percent per annum={}".format(a, b if b.strip() else "null"))

输出：

Period=2015-06-09, Percent per annum=1.842026
Period=2015-06-08, Percent per annum=1.741636
Period=2015-06-07, Percent per annum=null
Period=2015-06-06, Percent per annum=null
Period=2015-06-05, Percent per annum=1.700042
Period=2015-06-04, Percent per annum=1.667431

弄清楚如何用BeautifulSoup刮网

1 个答案: