从网站

时间:2016-11-17 23:58:34

标签: python beautifulsoup lxml

我正在尝试从网站(http://www.forexfactory.com/calendar.php?day=nov18.2016)中提取特定的表值 使用Python BeautifulSoup

到目前为止的代码:

from bs4 import BeautifulSoup
from urllib.request import urlopen

content = urlopen("http://www.forexfactory.com/calendar.php?day=nov18.2016").read()
soup = BeautifulSoup(content, 'html.parser')

tables = soup.findAll("table")
for table in tables:
     if table.findParent("table") is None:
        print(table)

我可以打印所有多个表及其内容,但是如何获得一个名为" calendar__table"的特定表格。并通过迭代来获取每一行及其值?

2 个答案:

答案 0 :(得分:1)

你可以在搜索中传递参数:

tables = soup.findAll("table", {'class':'calendar__table'})

然后你可以迭代槽表 - >行 - >细胞:

for table in tables:
    for row in table.findAll("tr"):
        for cell in row.findAll("td"):
            print(cell.text, end = ' ' )
        print()

答案 1 :(得分:1)

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.forexfactory.com/calendar.php?day=nov18.2016')
soup = BeautifulSoup(r.text, 'lxml')

calendar_table = soup.find('table', class_="calendar__table")
for row in calendar_table.find_all('tr', class_=['calendar__row calendar_row','newday']):
    row_data = [td.get_text(strip=True) for td in row.find_all('td')]
    print(row_data)

出:

['FriNov 18', '2:00am', 'EUR', '', 'German PPI m/m', '', '', '0.3%', '-0.2%', '']
['', '3:30am', 'EUR', '', 'ECB President Draghi Speaks', '', '', '', '', '']
['', '4:00am', 'EUR', '', 'Current Account', '', '', '31.3B', '29.7B', '']
['', '4:10am', 'GBP', '', 'MPC Member Broadbent Speaks', '', '', '', '', '']
['', '5:30am', 'CHF', '', 'Gov Board Member Maechler Speaks', '', '', '', '', '']
['', '8:30am', 'CAD', '', 'Core CPI m/m', '', '', '0.3%', '0.2%', '']
['', '9:30am', 'USD', '', 'FOMC Member Dudley Speaks', '', '', '', '', '']
['', '10:00am', 'USD', '', 'CB Leading Index m/m', '', '', '0.1%', '0.2%', '']
['', '9:45pm', 'USD', '', 'FOMC Member Powell Speaks', '', '', '', '', '']