使用https://www.mubasher.info/countries/eg/stock-prices中的HTML,我试图建立股价公司,其价格来自HTML中表格的原始数据,
我在python 3.7中尝试了以下代码
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as bs
import re
quotes_page = 'https://www.mubasher.info/countries/eg/stock-prices'
uClient = uReq(quotes_page)
page_content = uClient.read()
uClient.close()
soup = bs(page_content, 'html.parser')
table = soup.findChildren('table')[0]
rows = table.findChildren('tr')
for row in rows:
cells = row.findChildren('td')
for cell in cells:
cell_content = cell.getText()
clean_content = re.sub( '\s+', ' ', cell_content).strip()
print(clean_content)
#它显示以下结果,而不是页面中的实际值
{{row.name | limitTo : 20}}
{{row.value}}
{{row.changePercentage}}
{{row.change}}
{{row.turnover}}
{{row.volume}}
{{row.open}}
{{row.high}}
{{row.low}}
答案 0 :(得分:0)
该数据/表是动态的。它是在初始请求后呈现的。有一个API,您可以直接使用它:
您可以通过“检查”呈现的页面并找到适当的XHR来找到它:
import requests
from pandas.io.json import json_normalize
url = 'https://www.mubasher.info/api/1/stocks/prices'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
payload = {'country': 'eg'}
jsonData = requests.get(url, headers=headers, params=payload).json()
prices_df = json_normalize(jsonData['prices'])
输出:
print (prices_df)
change changePercentage ... value volume
0 0.43 9.56% ... 4.93 12,390
1 0.93 8.15% ... 12.34 47,648
2 0.47 7.21% ... 6.99 85
3 0.01 7.05% ... 0.17 20,530
4 0.08 6.73% ... 1.30 90,177
5 0.77 6.17% ... 13.25 28,350
6 0.00 5.77% ... 0.06 69,885
7 0.10 5.03% ... 2.09 174,638
8 0.21 5.00% ... 4.41 241,000
9 0.27 4.59% ... 6.15 50
10 0.30 4.54% ... 6.91 1,078,168
11 2.01 4.53% ... 46.39 1,050
12 0.03 4.45% ... 0.61 67,938
13 0.38 4.21% ... 9.40 57,518
14 0.43 3.72% ... 11.98 20
15 0.10 3.66% ... 2.83 192,764
16 0.60 3.63% ... 17.15 258,029
17 2.30 3.09% ... 76.85 3,892,116
18 0.34 3.02% ... 11.59 132,480
19 0.01 3.00% ... 0.38 412,967
20 0.22 2.88% ... 7.87 385,284
21 0.11 2.44% ... 4.62 11,048
22 0.03 2.36% ... 1.08 71,343
23 1.64 2.18% ... 76.75 90
24 0.03 2.18% ... 1.45 675,330
25 0.24 2.12% ... 11.54 348,092
26 0.31 2.04% ... 15.48 4,450
27 0.08 1.81% ... 4.51 3,393,121
28 15.26 1.68% ... 925.00 15
29 0.69 1.66% ... 42.20 585,712
.. ... ... ... ... ...
131 -0.09 -4.86% ... 1.67 37,000
132 -0.63 -9.86% ... 5.76 7,000
133 -12.25 -100.00% ... 12.10 1,000
134 -0.95 -100.00% ... 0.93 5,000
135 -44.04 -100.00% ... 44.05 150
136 -9.01 -100.00% ... 9.30 850
137 -8.06 -100.00% ... 8.00 1,055
138 -6.31 -100.00% ... 6.31 589
139 -3.87 -100.00% ... 3.81 333
140 -12.92 -100.00% ... 12.02 325
141 -7.80 -100.00% ... 7.80 500
142 -0.68 -100.00% ... 0.68 38,000
143 -0.86 -100.00% ... 0.82 32,870
144 -0.94 -100.00% ... 0.93 4,000
145 -1.71 -100.00% ... 1.70 7,250
146 -38.27 -100.00% ... 38.13 401
147 -1.20 -100.00% ... 1.22 480
148 -2.37 -100.00% ... 2.30 12,755
149 -11.43 -100.00% ... 11.29 1,115
150 -0.39 -100.00% ... 0.38 14,000
151 -5.28 -100.00% ... 5.21 1,200
152 -9.00 -100.00% ... 9.00 1,510
153 -1.28 -100.00% ... 1.28 2,000
154 -4.45 -100.00% ... 4.50 6,350
155 -31.70 -100.00% ... 32.97 500
156 -14.52 -100.00% ... 14.41 453
157 -4.60 -100.00% ... 4.78 3,670
158 -6.79 -100.00% ... 6.81 4,002
159 -3.84 -100.00% ... 3.76 5,800
160 -0.80 -100.00% ... 0.80 14,890
[161 rows x 15 columns]