Question

我正在尝试从网站（http://www.forexfactory.com/calendar.php?day=nov18.2016）中提取特定的表值使用Python BeautifulSoup

到目前为止的代码：

from bs4 import BeautifulSoup
from urllib.request import urlopen

content = urlopen("http://www.forexfactory.com/calendar.php?day=nov18.2016").read()
soup = BeautifulSoup(content, 'html.parser')

tables = soup.findAll("table")
for table in tables:
     if table.findParent("table") is None:
        print(table)

我可以打印所有多个表及其内容，但是如何获得一个名为＆＃34; calendar__table＆＃34;的特定表格。并通过迭代来获取每一行及其值？

Answer 1

你可以在搜索中传递参数：

tables = soup.findAll("table", {'class':'calendar__table'})

然后你可以迭代槽表 - ＆gt;行 - ＆gt;细胞：

for table in tables:
    for row in table.findAll("tr"):
        for cell in row.findAll("td"):
            print(cell.text, end = ' ' )
        print()

Answer 2

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.forexfactory.com/calendar.php?day=nov18.2016')
soup = BeautifulSoup(r.text, 'lxml')

calendar_table = soup.find('table', class_="calendar__table")
for row in calendar_table.find_all('tr', class_=['calendar__row calendar_row','newday']):
    row_data = [td.get_text(strip=True) for td in row.find_all('td')]
    print(row_data)

出：

['FriNov 18', '2:00am', 'EUR', '', 'German PPI m/m', '', '', '0.3%', '-0.2%', '']
['', '3:30am', 'EUR', '', 'ECB President Draghi Speaks', '', '', '', '', '']
['', '4:00am', 'EUR', '', 'Current Account', '', '', '31.3B', '29.7B', '']
['', '4:10am', 'GBP', '', 'MPC Member Broadbent Speaks', '', '', '', '', '']
['', '5:30am', 'CHF', '', 'Gov Board Member Maechler Speaks', '', '', '', '', '']
['', '8:30am', 'CAD', '', 'Core CPI m/m', '', '', '0.3%', '0.2%', '']
['', '9:30am', 'USD', '', 'FOMC Member Dudley Speaks', '', '', '', '', '']
['', '10:00am', 'USD', '', 'CB Leading Index m/m', '', '', '0.1%', '0.2%', '']
['', '9:45pm', 'USD', '', 'FOMC Member Powell Speaks', '', '', '', '', '']

从网站

2 个答案: