BeautifulSoup使用独特的CSS Selector

时间:2016-09-15 10:01:07

标签: python html css beautifulsoup

从这page开始,我需要从" Anbindung和das Telefonnetz"获得状态。

我确定了两种方法:

  1. 如果状态包含句子" Das System arbeitet einwandfrei";
  2. 如果背景颜色为绿色。
  3. 我选择了第一个选项。

    我使用Python / BeautifulSoup来抓取页面。问题是,没有唯一的id / class或任何东西来获得这个元素。
    然后我决定使用这个特定元素的CSS选择器,如下所示:

    div.system-item:nth-child(2) > div:nth-child(1) > p:nth-child(3)
    

    并像这样使用它:

    print(page.select("div.system-item:nth-child(2) > div:nth-child(1) > p:nth-child(3)"))
    

    然而,我唯一得到的是一个空元素([])。

    我可以尝试更多地获取这个特定元素吗?

    编辑
    正如你们中的一些人推荐的那样,这里是页面的完整HTML源代码。但实际上,我建议您自己查看page

    <!doctype html>
    <head>
        <meta charset="utf-8">
    
                <title>Aktueller Status | Placetel</title>
    
        <meta http-equiv="X-UA-Compatible" content="IE=Edge">
        <meta name="msvalidate.01" content="756F6E40DD887A659CE83E5A92FFBB62">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
    
        <meta name="generator" content="Kirby 2.3.2">
    
        <meta name="description" content="Placetel Systemstatus: Erfahren Sie mehr &uuml;ber den aktuellen Status der Placetel Telefonanlage.">
        <meta name="keywords" content="">
    
            <meta name="robots" content="index,follow,noodp,noydir">
    
        <link rel="canonical" href="https://www.placetel.de/status">
        <link rel="publisher" href="https://plus.google.com/b/111027512373770716962/111027512373770716962/posts">
    
        <link rel="shortcut icon" href="/favicon.ico">
        <link rel="apple-touch-icon" href="/apple-touch-icon.png">
        <meta name="msapplication-TileColor" content="#0e70b9">
        <meta name="msapplication-TileImage" content="/ms-tile-icon.png">
        <meta name="theme-color" content="#0e70b9">
    
        <script src="//use.typekit.net/rnw8lad.js"></script>
        <script>try { Typekit.load({ async: true }); } catch (e) {}</script>
    
        <link rel="stylesheet" href="https://www.placetel.de/assets/dist/css/main.css">    <script src="https://www.placetel.de/assets/dist/js/modernizr.js"></script>
        <link rel="dns-prefetch" href="//app.marketizator.com"/>
        <script>
            var _mktz = _mktz || [];
            _mktz.cc_domain = 'placetel.de';
        </script>
        <script type="text/javascript" src="//d2tgfbvjf3q6hn.cloudfront.net/js/o17fe41.js"></script>
    </head>
    <body id="" class="page page-template-page-sections page-uid-status">
    
    <script>
        var gaProperty = 'UA-17631409-3';
        var disableStr = 'ga-disable-' + gaProperty;
        if (document.cookie.indexOf(disableStr + '=true') > -1) {
            window[disableStr] = true;
        }
        function gaOptout() {
            document.cookie = disableStr + '=true; expires=Thu, 31 Dec 2099 23:59:59 UTC; path=/';
            window[disableStr] = true;
        }
    </script>
    
    <!-- Google Tag Manager -->
    <noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-KDNGCC"
                      height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
    <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
            new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
                                                      j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
            '//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
        })(window,document,'script','dataLayer','GTM-KDNGCC');</script>
    <!-- End Google Tag Manager -->
    <header class="header header-condensed" id="header">
        <div class="container-fluid">
    
    <nav class="navigation navigation-top">
        <ul>
                        <li class=" ">
                    <a title="Unternehmen" href="https://www.placetel.de/unternehmen">
    
                        <span>Unternehmen</span>
                    </a>
                </li>
                        <li class=" ">
                    <a title="Partner werden" href="https://www.placetel.de/partner">
    
                        <span>Partner werden</span>
                    </a>
                </li>
                        <li class=" ">
                    <a title="Support" href="https://www.placetel.de/support">
    
                        <span>Support</span>
                    </a>
                </li>
                        <li class=" ">
                    <a title="Suche" href="javascript:modal('search')">
    
                        <span>Suche</span>
                    </a>
                </li>
                    <li class="navigation-top-support">
                <a href="https://www.placetel.de/support">
                    <svg class="svg-phone"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-phone"></use></svg>                <span>0221 29 191 999</span>
                </a>
            </li>
            <li class="navigation-top-login">
                <a href="https://app.placetel.de/account/login">
                    <span>Login</span>
                </a>
            </li>
        </ul>
    </nav>    </div>
    
        <div class="container-fluid">
            <a class="site-logo" href="https://www.placetel.de">
                <svg class="svg-placetel-logo"><title>Placetel</title> <use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-placetel-logo"></use></svg>        </a>
    
    <nav class="navigation navigation-main" id="navigation-main">
        <ul>
    
                <li class="has-sub-navigation">
                    <a title="Telefonanlage" href="https://www.placetel.de/telefonanlage"
                       class="">
                        <span>Telefonanlage</span>
    
                                                <svg class="svg-arrow"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-arrow"></use></svg>                                    </a>
    
                                        <nav class="sub-navigation">
                            <ul>
                                                                <li class="">
                                        <a href="https://www.placetel.de/telefonanlage">
                                            Vorteile                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/telefonanlage/preise">
                                            Preise                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/telefonanlage/funktionen">
                                            Funktionen                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/telefonanlage/unified-communication">
                                            Unified Communication                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/telefonanlage/funktionsweise">
                                            Wie funktioniert es?                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/telefonanlage/isdn-abschaltung">
                                            ISDN-Abschaltung                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/telefonanlage/faq">
                                            FAQ                                    </a>
                                    </li>
                                                        </ul>
                        </nav>
                                </li>
    
                <li class="">
                    <a title="Trunking" href="https://www.placetel.de/sip-trunking"
                       class="">
                        <span>Trunking</span>
    
                                        </a>
    
                                </li>
    
                <li class="">
                    <a title="Mobilfunk" href="https://www.placetel.de/mobilfunk"
                       class="">
                        <span>Mobilfunk</span>
    
                                        </a>
    
                                </li>
    
                <li class="navigation-main-shop">
                    <a title="Endger&auml;te-Shop" href="/shop/"
                       class="">
                        <span>Endger&auml;te-Shop</span>
    
                                        </a>
    
                                </li>
    
                <li class="visible-xs-block visible-sm-block">
                    <a title="Support" href="https://www.placetel.de/support"
                       class="">
                        <span>Support</span>
    
                                        </a>
    
                                </li>
    
                <li class="visible-xs-block visible-sm-block">
                    <a title="Partner" href="https://www.placetel.de/partner"
                       class="">
                        <span>Partner</span>
    
                                        </a>
    
                                </li>
    
                <li class="has-sub-navigation visible-xs-block visible-sm-block">
                    <a title="Unternehmen" href="https://www.placetel.de/unternehmen"
                       class="">
                        <span>Unternehmen</span>
    
                                                <svg class="svg-arrow"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-arrow"></use></svg>                                    </a>
    
                                        <nav class="sub-navigation">
                            <ul>
                                                                <li class="">
                                        <a href="https://www.placetel.de/unternehmen">
                                            &Uuml;ber uns                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/unternehmen/technologie">
                                            Technologie                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/unternehmen/jobs">
                                            Jobs                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/unternehmen/events">
                                            Events                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/unternehmen/presse">
                                            Presse                                    </a>
                                    </li>
                                                                <li class="">
                                        <a href="https://www.placetel.de/unternehmen/kontakt">
                                            Kontakt                                    </a>
                                    </li>
                                                        </ul>
                        </nav>
                                </li>
    
                <li class="navigation-main-register">
                    <a title="Kostenlos testen!" href="javascript:modal('register')"
                       class="btn">
                        <span>Kostenlos testen!</span>
    
                                        </a>
    
                                </li>
                </ul>
    </nav>        
            <a class="site-navigation-toggle" id="hotdog">
                <i>
                    <span></span>
                </i> Menü
            </a>
        </div>
    </header>
    
    
                <section class="section section-full section-full-section-einleitung-text section-full-normal">
        <div class="container-fluid typography typography-dark">
                        <h2 class="section-full-title">Der Placetel System Status</h2>
    
                        <h3 class="section-full-subtitle">Jeden Tag einen Grund zur Freude.</h3>
    
                        <p>Wir bei Placetel haben ein Lieblingswort: „läuft“. Der Grund: Ihre Placetel Telefonanlage funktioniert nämlich immer. Darüber freuen wir uns natürlich riesig. Da aber erst eine geteilte Freude eine richtige Freude ist, haben wir Ihnen diese Statusseite eingerichtet.  Diese Seite informiert Sie jeden Tag über den einwandfreien Status Ihrer Anlage.<br />
    Und falls etwas mal nicht so perfekt funktionieren sollte wie gewohnt, können Sie uns den Fehler gern  melden.</p>        
                </div>
    
                <style>
                .section-full-section-einleitung-text {
                    background-color: ;
                }
            </style>
    
        </section>    
    
                <section class="section section-system">
        <a class="btn btn-primary btn-transparent btn-with-icon" href="javascript:location.reload();">
            <svg class="svg-refresh"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-refresh"></use></svg>        Status aktualisieren
        </a>
    
        <div class="system flex-grid typography typography-light">
            <div class="system-item system-item-green flex-grid-item">
                <div class="system-item-inner">
                    <h6>
                        System                </h6>
    
                    <i>
                        <svg class="svg-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-included"></use></svg>                    <svg class="svg-dots"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-dots"></use></svg>                    <svg class="svg-not-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-not-included"></use></svg>                </i>
    
                    <p>
                        Das System arbeitet einwandfrei<br>
                        11:10 Uhr
                    </p>
    
                                </div>
            </div>
    
            <div class="system-item system-item-green flex-grid-item">
                <div class="system-item-inner">
                    <h6>
                        Anbindung an das  Telefonnetz                </h6>
    
                    <i>
                        <svg class="svg-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-included"></use></svg>                    <svg class="svg-dots"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-dots"></use></svg>                    <svg class="svg-not-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-not-included"></use></svg>                </i>
    
                    <p>
                        Das System arbeitet einwandfrei<br>
                        11:10 Uhr
                    </p>
    
                                </div>
            </div>
    
            <div class="system-item system-item-green flex-grid-item">
                <div class="system-item-inner">
                    <h6>
                        Faxsystem                </h6>
    
                    <i>
                        <svg class="svg-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-included"></use></svg>                    <svg class="svg-dots"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-dots"></use></svg>                    <svg class="svg-not-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-not-included"></use></svg>                </i>
    
                    <p>
                        Das System arbeitet einwandfrei<br>
                        11:10 Uhr
                    </p>
    
                                </div>
            </div>
    
            <div class="system-item system-item-green flex-grid-item">
                <div class="system-item-inner">
                    <h6>
                        Konferenzsystem                </h6>
    
                    <i>
                        <svg class="svg-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-included"></use></svg>                    <svg class="svg-dots"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-dots"></use></svg>                    <svg class="svg-not-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-not-included"></use></svg>                </i>
    
                    <p>
                        Das System arbeitet einwandfrei<br>
                        11:10 Uhr
                    </p>
    
                                </div>
            </div>
    
            <div class="system-item system-item-green flex-grid-item">
                <div class="system-item-inner">
                    <h6>
                        Features und Optionen                </h6>
    
                    <i>
                        <svg class="svg-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-included"></use></svg>                    <svg class="svg-dots"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-dots"></use></svg>                    <svg class="svg-not-included"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.placetel.de/assets/dist/sprites/svg/sprite.1471515912.svg#svg-not-included"></use></svg>                </i>
    
                    <p>
                        Das System arbeitet einwandfrei<br>
                        11:10 Uhr
                    </p>
    
                                </div>
            </div>
        </div>
    </section>    
    
    </body>
    </html>
    

2 个答案:

答案 0 :(得分:1)

据我所知BeautifulSoup4仍未在_system.scss中实施。此外,如果您调查网站的CSS(即system-item-green文件),您会发现有3种状态:

  1. system-item-yellow
  2. system-item-red
  3. import requests from bs4 import BeautifulSoup as BS url = 'https://www.placetel.de/status' headers = { 'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0' } source = requests.get(url, headers=headers) soup = BS(source.text, 'html.parser') status = soup.select("div.system-item")[1].attrs['class'] if 'system-item-green' in status: print("It works!") elif 'system-item-yellow' in status: print("Something's slightly wrong") elif 'system-item-red' in status: print("Does not work") else: print("Has someone changed page's markup?")
  4. 所以你可能想稍微改变你的代码:

    [[EAAccessoryManager sharedAccessoryManager] showBluetoothAccessoryPickerWithNameFilter:namePredicate completion:^(NSError * _Nullable error)
         {
    
    }];
    

答案 1 :(得分:1)

您可以使用该文本查找Anbindung an das Telefonnetz的h6并获取p兄弟:

import requests
import re
r = requests.get("https://www.placetel.de/status").content
soup = BeautifulSoup(r, "lxml")

h6 = soup.find("h6", text=re.compile(ur"Anbindung an das  Telefonnetz", re.I))
if h6:
    print(h6.find_next_sibling("p"))

如果您想要完整的css3选择器支持,可以使用 lxml的 cssselect

from lxml import html
tree = html.fromstring(r)
print(tree.cssselect("div.system-item:nth-child(2) > div:nth-child(1) > p:nth-child(3)")

您也可以只搜索文字,这样如果h6转为h5或任何其他标签,它就没有赔率:

match = soup.find(text=re.compile(ur"Anbindung an das  Telefonnetz", re.I))

if match:
    print(match.parent.find_next_sibling("p").text)

您可以使用外部div来本地化搜索文本,bs4非常灵活。如果订单发生变化,只选择所有div.system-item和索引就会中断,你不会知道因为没有错误,所以寻找文本可能实际上是一种更安全的方法。