我正在尝试从网站(http://www.forexfactory.com/calendar.php?day=nov18.2016)中提取特定的表值 使用Python BeautifulSoup
到目前为止的代码:
from bs4 import BeautifulSoup
from urllib.request import urlopen
content = urlopen("http://www.forexfactory.com/calendar.php?day=nov18.2016").read()
soup = BeautifulSoup(content, 'html.parser')
tables = soup.findAll("table")
for table in tables:
if table.findParent("table") is None:
print(table)
我可以打印所有多个表及其内容,但是如何获得一个名为" calendar__table"的特定表格。并通过迭代来获取每一行及其值?
答案 0 :(得分:1)
你可以在搜索中传递参数:
tables = soup.findAll("table", {'class':'calendar__table'})
然后你可以迭代槽表 - >行 - >细胞:
for table in tables:
for row in table.findAll("tr"):
for cell in row.findAll("td"):
print(cell.text, end = ' ' )
print()
答案 1 :(得分:1)
import requests
from bs4 import BeautifulSoup
r = requests.get('http://www.forexfactory.com/calendar.php?day=nov18.2016')
soup = BeautifulSoup(r.text, 'lxml')
calendar_table = soup.find('table', class_="calendar__table")
for row in calendar_table.find_all('tr', class_=['calendar__row calendar_row','newday']):
row_data = [td.get_text(strip=True) for td in row.find_all('td')]
print(row_data)
出:
['FriNov 18', '2:00am', 'EUR', '', 'German PPI m/m', '', '', '0.3%', '-0.2%', '']
['', '3:30am', 'EUR', '', 'ECB President Draghi Speaks', '', '', '', '', '']
['', '4:00am', 'EUR', '', 'Current Account', '', '', '31.3B', '29.7B', '']
['', '4:10am', 'GBP', '', 'MPC Member Broadbent Speaks', '', '', '', '', '']
['', '5:30am', 'CHF', '', 'Gov Board Member Maechler Speaks', '', '', '', '', '']
['', '8:30am', 'CAD', '', 'Core CPI m/m', '', '', '0.3%', '0.2%', '']
['', '9:30am', 'USD', '', 'FOMC Member Dudley Speaks', '', '', '', '', '']
['', '10:00am', 'USD', '', 'CB Leading Index m/m', '', '', '0.1%', '0.2%', '']
['', '9:45pm', 'USD', '', 'FOMC Member Powell Speaks', '', '', '', '', '']