例如,我无法解析以下链接中的数据:
https://www.bseindia.com/stock-share-price/avanti-feeds-ltd/avanti/512573/
我要从此网页填充高低表。我尝试了表和div的许多组合,但徒劳无功。下面是我的python beautifulsoup代码(BS4)
import csv
import requests
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
f = open('bse.csv', 'w', newline = '')
writer = csv.writer(f)
with open("bselist.csv") as f:
for row in csv.reader(f):
for stock in row:
url = "https://www.bseindia.com/stock-share-price/{}".format(stock)
soup = BeautifulSoup(urllib.request.urlopen(url).read(), "lxml")
mydivs = soup('div', {"class": "newscripcotent5"})[0].find_all('span')
writer.writerow([stock] + mydivs)
print([stock] + mydivs)
为简单起见,URL我已直接链接到文件bselist.csv中包含的记录之一。我正在寻找div id为“ highlow”
它只是给我以下输出
avanti-feeds-ltd/avanti/512573/
没有我要寻找的桌子。
理想情况下,输出应类似于以下内容:
avanti-feeds-ltd/avanti/512573/ 52 Week High (adjusted) 999.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (adjusted) 410.26(05/06/2018)
avanti-feeds-ltd/avanti/512573/ 52 Week High (Unadjusted) 3,000.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (Unadjusted) 535.50(29/06/2018)
avanti-feeds-ltd/avanti/512573/ Month H/L 659.34/410.26
avanti-feeds-ltd/avanti/512573/ Week H/L 625.25/508.82
答案 0 :(得分:0)
您尝试获取的信息似乎是使用javascript动态填充的,这可能就是为什么您无法获取它的原因。因此,为了解决这个问题,您可以使用selenium webdriver
来获取动态内容。
这是代码的外观:
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
output_file = open('bse.csv', 'w')
with open("bselist.csv") as f:
for row in csv.reader(f):
for stock in row:
url = "https://www.bseindia.com/stock-share-price/{}".format(stock)
driver = webdriver.Chrome('/path/to/chromedriver')
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
div = soup.find_all('div', {"class": "newscripcotent5"})[0]
outer_table = div.find_all('table')[0]
inner_table = outer_table.findChildren("table")[0]
rows = inner_table.findChildren("tr")
for row in rows:
cols = row.findChildren("td")
if len(cols) < 2:
continue
output_file.write(stock + "," + cols[0].getText() + "," + cols[1].getText() + "\n")
print(stock + " " + cols[0].getText() + " " + cols[1].getText())
f.close()
请确保将/path/to/chromedriver
替换为chromedriver
的适当路径。
因此,假设您的bselist.csv
包含:
avanti-feeds-ltd/avanti/512573/
您将获得以下输出:
avanti-feeds-ltd/avanti/512573/ 52 Week High (adjusted) 999.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (adjusted) 410.26(05/06/2018)
avanti-feeds-ltd/avanti/512573/ 52 Week High (Unadjusted) 3,000.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (Unadjusted) 507.00(02/07/2018)
avanti-feeds-ltd/avanti/512573/ Month H/L 659.34/410.26
avanti-feeds-ltd/avanti/512573/ Week H/L 615.00/507.00
如果您还没有selenium
和chromedriver
,则需要先安装它。我在Mac OS上这样安装了这些程序:
sudo easy_install selenium
sudo easy_install chromedriver
您可能会发现以下帖子很有帮助: