我最近开始学习更多有关Python以及如何使用BeautifulSoup解析网站的信息。
我现在面临的问题是我似乎陷入困境。
HTML代码(以汤为主):
<div class="mod-3-piece-app__visual-container__chart">
<div class="mod-ui-chart--dynamic" data-chart-config='{"chartData":{"periods":[{"year":2013,"period":null,"periodicity":"A","icon":null},{"year":2014,"period":null,"periodicity":"A","icon":null},{"year":2015,"period":null,"periodicity":"A","icon":null},{"year":2016,"period":null,"periodicity":"A","icon":null},{"year":2017,"period":null,"periodicity":"A","icon":null},{"year":2018,"period":null,"periodicity":"A","icon":null}],"forecastRange":{"from":3.5,"to":5.5},"actualValues":[5.6785,6.45,9.22,8.31,null,null],"consensusData":[{"y":5.6307,"toolTipData":{"low":5.5742,"high":5.7142,"analysts":34,"restatement":null}},{"y":6.3434,"toolTipData":{"low":6.25,"high":6.5714,"analysts":35,"restatement":null}},{"y":9.1265,"toolTipData":{"low":9.02,"high":9.28,"analysts":40,"restatement":null}},{"y":8.2734,"toolTipData":{"low":8.17,"high":8.335,"analysts":40,"restatement":null}},{"y":8.9304,"toolTipData":{"low":8.53,"high":9.63,"analysts":41,"restatement":null}},{"y":10.1252,"toolTipData":{"low":8.63,"high":11.61,"analysts":42,"restatement":null}}]}}'>
<noscript>
<div class="mod-ui-chart--static">
<div class="mod-ui-chart--sprited" style="width:410px; height:135px; background:url('/data/Charts/EquityForecast?issueID=36276&height=135&width=410') 0px -270px no-repeat;">
</div>
</div>
</noscript>
</div>
</div>
我的代码:
from bs4 import BeautifulSoup
import urllib.request
data = []
List = ['AAPL']
# Iterates Through List
for i in List :
# The webpage which we wish to Parse
soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')
# Gathering the data
Values = soup.find_all("div", {"class":"mod-3-piece-app__visual-container__chart"})[4]
print(Values)
# Getting desired values from data
我希望获得的是{"y" ....,
之后的值,因此数字5.6307,6.3434,9.1265, 8.2734, 8.9304 and 10.1252
,但我不能为我的生活弄清楚如何。我尝试了Values.get_text
以及Values.text
,但这只是空白(可能是因为所有代码都在列表中或类似内容中)。
如果我可以在“toolTipData”之后获取数据也可以。
有没有人介意帮助我?
如果我遗漏了任何内容,请提供反馈意见,以便我将来可以提出更好的问题。
谢谢
答案 0 :(得分:1)
很快,您希望获得位于属性标记内的一些信息。
我所要做的就是:
find_all
寻找合适的类属性mod-ui-chart--dynamic
find_all
定位的每个元素,使用.get()
'actualValues'
'actualValues'
,则加载json并浏览其值。尝试以下代码。我已经评论过了,所以你应该能够理解它在做什么。
<强>代码:强>
from bs4 import BeautifulSoup
import urllib.request
import json
data = []
List = ['AAPL']
# Iterates Through List
for i in List:
# The webpage which we wish to Parse
soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')
# Gathering the data
elemList = soup.find_all('div', {'class':'mod-ui-chart--dynamic'})
#we will get the attribute info of each `data-chart-config` tag, inside each `div` with `class=mod-ui-chart--dynamic`
for elem in elemList:
elemID = elem.get('class')
elemName = elem.get('data-chart-config')
#if there's no value in elemName, pass...
if elemName is None:
pass
#if the term 'actualValues' exists in elemName
elif 'actualValues' in elemName:
#print('Extracting actualValues from:\n')
#print("Attribute id = %s" % elemID)
#print()
#print("Attribute name = %s" % elemName)
#print()
#reading `data-chart-config` attribute as a json
data = json.loads(elemName)
#print(json.dumps(data, indent=4, sort_keys=True))
#print(data['chartData']['actualValues'])
#fetching desired info
val1 = data['chartData']['actualValues'][0]
val2 = data['chartData']['actualValues'][1]
val3 = data['chartData']['actualValues'][2]
val4 = data['chartData']['actualValues'][3]
#printing desired values
print(val1, val2, val3, val4)
print('-'*15)
<强>输出:强>
1.9 1.42 1.67 3.36
---------------
5.6785 6.45 9.22 8.31
---------------
50557000000 42358000000 46852000000 78351000000
---------------
170910000000 182795000000 233715000000 215639000000
---------------
p.s.1:如果需要,您可以取消注释print()
内的elif loop
个功能,以了解该计划。
p.s.2:如果需要,您可以将'actualValues'
更改为val1 = data['chartData']['actualValues'][0]
<{1}}