关于python web scraping的几乎没有任何知识。
我需要从this页面获取一个表格:
http://performance.morningstar.com/funds/etf/total-returns.action?t=IWF
这就是我现在所拥有的:
from selenium import webdriver
from bs4 import BeautifulSoup
# load chrome driver
driver = webdriver.Chrome('C:/.../chromedriver_win32/chromedriver')
# load web page and get source html
link = 'http://performance.morningstar.com/funds/etf/total-returns.action?t=IWF'
driver.get(link)
html = driver.page_source
# make soup and get all tables
soup = BeautifulSoup(html, 'html.parser')
tables = soup.findAll('table',{'class':'r_table3'})
tbl = tables[1] # ideally we should select table by name
我从哪里开始?
答案 0 :(得分:1)
要从该网页获取数据,您可以这样:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
link = 'http://performance.morningstar.com/funds/etf/total-returns.action?t=IWF'
driver.get(link)
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'lxml')
driver.quit()
tab_data = soup.select('table')[1]
for items in tab_data.select('tr'):
item = [elem.text for elem in items.select('th,td')]
print(' '.join(item))
部分结果:
Total Return % 1-Day 1-Week 1-Month 3-Month YTD 1-Year 3-Year 5-Year 10-Year 15-Year
IWF (Price) 0.13 0.83 2.68 5.67 23.07 26.60 15.52 15.39 8.97 10.14
IWF (NAV) 0.20 0.86 2.66 5.70 23.17 26.63 15.52 15.40 8.98 10.14
S&P 500 TR USD (Price) 0.18 0.52 2.42 4.52 16.07 22.40 13.51 14.34 7.52 9.76
答案 1 :(得分:0)
好的,所以我就是这样做的:
int
是否有更优雅的方式,即没有循环遍历列和行等,但是我可以调用现成的方法?