我正在尝试编写代码以在不使用pandas或Numpy的情况下从nasdaq页面上的表格中对数据进行网络爬虫分析,但我一直在尝试各种方法,并且属性错误不断发生
我尝试观看教程并使用不同的解析器,但是没有用
import bs4
import requests
from bs4 import BeautifulSoup as soup
#setting webpage
my_url = requests.get("https://www.nasdaq.com/market-
activity/stocks/rshpf/historical").text
#grabbing webpage and opening the connect
#does html parser
page_soup = soup(my_url, "xml")
table = soup.find("table")
我希望看到表格的html文本
答案 0 :(得分:0)
页面是动态的,并且在拉动html之后呈现数据,这意味着表标记为空。
您可以从API获取数据,然后将其转换为数据框:
import pandas as pd
import requests
url = 'https://api.nasdaq.com/api/quote/RSHPF/historical'
payload = {
'assetclass': 'stocks',
'fromdate': '2019-10-06',
'limit': '100',
'todate': '2019-11-06'}
jsonData = requests.get(url, params=payload).json()
df = pd.DataFrame(jsonData['data']['tradesTable']['rows'])
输出:
print (df.to_string())
close date high low open volume
0 $1.97 11/05/2019 $1.97 $1.97 $1.97 N/A
1 $1.97 11/04/2019 $1.97 $1.97 $1.97 N/A
2 $1.97 11/01/2019 $1.97 $1.97 $1.97 12,600
3 $1.96 10/31/2019 $1.96 $1.96 $1.96 N/A
4 $1.96 10/30/2019 $1.96 $1.96 $1.96 N/A
5 $1.96 10/29/2019 $1.96 $1.96 $1.96 N/A
6 $1.96 10/28/2019 $1.96 $1.96 $1.96 N/A
7 $1.96 10/25/2019 $1.96 $1.96 $1.96 N/A
8 $1.96 10/24/2019 $1.96 $1.96 $1.96 N/A
9 $1.96 10/23/2019 $1.96 $1.96 $1.96 N/A
10 $1.96 10/22/2019 $1.96 $1.96 $1.96 N/A
11 $1.96 10/21/2019 $1.96 $1.96 $1.96 N/A
12 $1.96 10/18/2019 $1.96 $1.96 $1.96 N/A
13 $1.96 10/17/2019 $1.96 $1.96 $1.96 N/A
14 $1.96 10/16/2019 $1.96 $1.96 $1.96 N/A
15 $1.96 10/15/2019 $1.96 $1.96 $1.96 7,650
16 $2 10/14/2019 $2 $2 $2 N/A
17 $2 10/11/2019 $2 $2 $2 N/A
18 $2 10/10/2019 $2 $2 $2 250
19 $2 10/09/2019 $2 $2 $2 250
20 $2 10/08/2019 $2 $2 $2 N/A
21 $2 10/07/2019 $2 $2 $2 200