Java - JSOUP:选择网站的特定部分

时间:2015-01-22 09:24:25

标签: java jsoup

我尝试读取Office 365网站,将其与代理配置进行比较。但我无法选择正确的选项,以便它只是获取这些网址和IP地址的特定部分。

public class Office365WebsiteParser {

    Document doc = null;


    String WebseitenInhalt;

    public void Parser() {
        System.setProperty("http.proxyHost", "xxx");
        System.setProperty("http.proxyPort", "8081");
        System.setProperty("https.proxyHost", "xxx");
        System.setProperty("https.proxyPort", "8081");

        for (int i = 1; i <= 5; i++) {
            try {
                doc = Jsoup.connect("https://technet.microsoft.com/de-de/library/hh373144.aspx").userAgent("Mozilla").get();
                break; // Break immediately if successful
            } catch (IOException e) {
                // Swallow exception and try again
                System.out.println("jsoup Timeout occurred " + i + " time(s)");
            }
        }

        if (doc == null) {
            System.out.println("Connection timeout after 5 tries");
        } else { // Wenn alles funktioniert hat Webseite auswerten

            Elements urls_Office365_URLs = doc.select("div.codeSnippetContainerCode");


            // HTML auswahl der Webseite nach div class und div id
        //  urls_Office365_URLs_global = urls_Office365_URLs;

            WebseitenInhalt=urls_Office365_URLs.text();
        }

    }

    public void Print() {
        System.out.println(WebseitenInhalt);
    }

    public String get() {
        return WebseitenInhalt;
    }
}

我只想选择这样的容器:

<div class="codeSnippetContainerCodeContainer">
        <div class="codeSnippetToolBar">
            <div class="codeSnippetToolBarText">
                <a name="CodeSnippetCopyLink" style="display: none;" title="In Zwischenablage kopieren" href="javascript:if (window.epx.codeSnippet)window.epx.codeSnippet.copyCode('CodeSnippetContainerCode_0f6f9acf-6aa4-471f-8600-f8d059f95493');">Kopieren</a>
            </div>
        </div>
        <div id="CodeSnippetContainerCode_0f6f9acf-6aa4-471f-8600-f8d059f95493" class="codeSnippetContainerCode" dir="ltr">
            <div style="color:Black;"><pre>

*.live.com
*.officeapps.live.com
*.microsoft.com
*.glbdns.microsoft.com
*.microsoftonline.com
*.office365.com
*.office.com
Portal.Office.com
*.onmicrosoft.com

*.microsoftonline-p.com^
*.microsoftonline-p.net^
*.microsoftonlineimages.com^
*.microsoftonlinesupport.net^
*.msecnd.net^
*.msocdn.com^
*.msn.com^
*.msn.co.jp^
*.msn.co.uk^
*.office.net^

*.aadrm.com^^
*.cloudapp.net^^

*.activedirectory.windowsazure.com^^^
*.phonefactor.net^^^

</pre></div>
            
        </div>
    </div>
</div>

1 个答案:

答案 0 :(得分:0)

试试这个CSS选择器:

table:has(th:matches(.+-URLs?)) td:first-of-type pre

DEMO