Question

我最近开始学习更多有关Python以及如何使用BeautifulSoup解析网站的信息。

我现在面临的问题是我似乎陷入困境。

HTML代码（以汤为主）：

<div class="mod-3-piece-app__visual-container__chart">
    <div class="mod-ui-chart--dynamic" data-chart-config='{"chartData":{"periods":[{"year":2013,"period":null,"periodicity":"A","icon":null},{"year":2014,"period":null,"periodicity":"A","icon":null},{"year":2015,"period":null,"periodicity":"A","icon":null},{"year":2016,"period":null,"periodicity":"A","icon":null},{"year":2017,"period":null,"periodicity":"A","icon":null},{"year":2018,"period":null,"periodicity":"A","icon":null}],"forecastRange":{"from":3.5,"to":5.5},"actualValues":[5.6785,6.45,9.22,8.31,null,null],"consensusData":[{"y":5.6307,"toolTipData":{"low":5.5742,"high":5.7142,"analysts":34,"restatement":null}},{"y":6.3434,"toolTipData":{"low":6.25,"high":6.5714,"analysts":35,"restatement":null}},{"y":9.1265,"toolTipData":{"low":9.02,"high":9.28,"analysts":40,"restatement":null}},{"y":8.2734,"toolTipData":{"low":8.17,"high":8.335,"analysts":40,"restatement":null}},{"y":8.9304,"toolTipData":{"low":8.53,"high":9.63,"analysts":41,"restatement":null}},{"y":10.1252,"toolTipData":{"low":8.63,"high":11.61,"analysts":42,"restatement":null}}]}}'>
        <noscript>
            <div class="mod-ui-chart--static">
                <div class="mod-ui-chart--sprited" style="width:410px; height:135px; background:url('/data/Charts/EquityForecast?issueID=36276&amp;height=135&amp;width=410') 0px -270px no-repeat;">
                </div>
            </div>
        </noscript>
    </div>
</div>

我的代码：

from bs4 import BeautifulSoup
import urllib.request


data = []
List = ['AAPL']

# Iterates Through List
for i in List :   
    # The webpage which we wish to Parse
    soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')

    # Gathering the data
    Values = soup.find_all("div", {"class":"mod-3-piece-app__visual-container__chart"})[4]
    print(Values)

    # Getting desired values from data

我希望获得的是{"y" ....,之后的值，因此数字5.6307,6.3434,9.1265, 8.2734, 8.9304 and 10.1252，但我不能为我的生活弄清楚如何。我尝试了Values.get_text以及Values.text，但这只是空白（可能是因为所有代码都在列表中或类似内容中）。

如果我可以在“toolTipData”之后获取数据也可以。

有没有人介意帮助我？

如果我遗漏了任何内容，请提供反馈意见，以便我将来可以提出更好的问题。

谢谢

Answer 1

很快，您希望获得位于属性标记内的一些信息。

我所要做的就是：

打开网页来源，了解您的信息位于何处
使用find_all寻找合适的类属性mod-ui-chart--dynamic
对于使用find_all定位的每个元素，使用.get()
在属性内容字符串中搜索术语'actualValues'
如果找到'actualValues'，则加载json并浏览其值。

尝试以下代码。我已经评论过了，所以你应该能够理解它在做什么。

<强>代码：

from bs4 import BeautifulSoup
import urllib.request
import json

data = []
List = ['AAPL']

# Iterates Through List
for i in List:   
    # The webpage which we wish to Parse
    soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')

    # Gathering the data
    elemList = soup.find_all('div', {'class':'mod-ui-chart--dynamic'})

    #we will get the attribute info of each `data-chart-config` tag, inside each `div` with `class=mod-ui-chart--dynamic`
    for elem in elemList:

        elemID = elem.get('class')
        elemName = elem.get('data-chart-config')

        #if there's no value in elemName, pass...
        if elemName is None:
            pass

        #if the term 'actualValues' exists in elemName 
        elif 'actualValues' in elemName:
            #print('Extracting actualValues from:\n')
            #print("Attribute id = %s" % elemID)
            #print()
            #print("Attribute name = %s" % elemName)
            #print()

            #reading `data-chart-config` attribute as a json
            data = json.loads(elemName)

            #print(json.dumps(data, indent=4, sort_keys=True))
            #print(data['chartData']['actualValues'])

            #fetching desired info
            val1 = data['chartData']['actualValues'][0]
            val2 = data['chartData']['actualValues'][1]
            val3 = data['chartData']['actualValues'][2]
            val4 = data['chartData']['actualValues'][3]

            #printing desired values
            print(val1, val2, val3, val4)

            print('-'*15)

<强>输出：

1.9 1.42 1.67 3.36
---------------
5.6785 6.45 9.22 8.31
---------------
50557000000 42358000000 46852000000 78351000000
---------------
170910000000 182795000000 233715000000 215639000000
---------------

p.s.1：如果需要，您可以取消注释print()内的elif loop个功能，以了解该计划。

p.s.2：如果需要，您可以将'actualValues'更改为val1 = data['chartData']['actualValues'][0] <{1}}

从BeautifulSoup Parsing获取特定值

1 个答案: