Jsoup - 使用URLS

时间:2016-02-22 02:29:22

标签: url jsoup

我有以下代码:

public static void main (String args[]) throws IOException
{
    String absHref = "";
    String urlList = "";
    String relHref = "";

    Document doc = Jsoup.connect("https://www.planittesting.com").get();
    Elements links = doc.select("a[href]"); 
    for (Element link : links) 
    {
        absHref = link.attr("abs:href");
        urlList = absHref.toString();
        System.out.println(urlList);

但结果却有差距,我错过了什么?我将相对网址转换为绝对网址,但其中一些网页会以空白形式返回。

[enter image description here]

2 个答案:

答案 0 :(得分:1)

如果您使用link.attr("href");,则可以看到这些href属性不为空,但它们包含其他内容,例如:

javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl01$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl02$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl03$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl04$lbChangeSite','')

如果您使用link.attr("abs:href");,则会看到所有不属于javascript的网址的空白值。

您可以添加一个简单的检查来修复它:

package com.github.davidepastore.stackoverflow35544869;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**
 * Stackoverflow 35544869 question.
 *
 */
public class App 
{
    public static void main( String[] args ) throws IOException
    {
        String absHref = "";
        String urlList = "";
        String relHref = "";

        Document doc = Jsoup.connect("https://www.planittesting.com").get();
        Elements links = doc.select("a[href]"); 
        for (Element link : links) 
        {
            absHref = link.attr("abs:href");
            if(!absHref.isEmpty()){
                urlList = absHref.toString();
                System.out.println(urlList);
            }
        }
    }
}

输出:

https://www.planittesting.com/uk/Home#main
https://www.planittesting.com/uk/Home
https://www.planittesting.com/uk/Home
https://www.linkedin.com/company/planit-software-testing
https://www.planittesting.com/uk/Course-Bookings
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/
https://www.planittesting.com/uk/Services
https://www.planittesting.com/Services/Functional-Testing
https://www.planittesting.com/Services/Test-Automation
https://www.planittesting.com/Services/Performance-Testing
https://www.planittesting.com/Services/Accessibility-Testing
https://www.planittesting.com/Services/Security-Testing
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Services/Digital-Testing
https://www.planittesting.com/Services/Agile-Testing
https://www.planittesting.com/Services/Non-Agile-Testing
https://www.planittesting.com/Services/Test-Strategy
https://www.planittesting.com/Services/Test-Management
https://www.planittesting.com/Services/Process-Improvement
https://www.planittesting.com/Services/DevOps-Solutions
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Application-Monitoring-Solutions
https://www.planittesting.com/Services/Test-Management-as-a-Service
https://www.planittesting.com/Services/Performance-Testing-Solutions
https://www.planittesting.com/Services/Tools-Licensing
https://www.planittesting.com/Services/On-site-Testing
https://www.planittesting.com/Services/Off-site-Testing
https://www.planittesting.com/Services/Off-shore-Testing
https://www.planittesting.com/uk/Training
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/ISTQB-Foundation-Certificate
https://www.planittesting.com/Training/ISTQB-Advanced-Test-Analyst
https://www.planittesting.com/Training/ISTQB-Advanced-Test-Manager
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/Agile
https://www.planittesting.com/Training/ISTQB-Foundation-Agile-Tester-Extension
https://www.planittesting.com/Training/Certified-Agile-Essentials
https://www.planittesting.com/Training/Certified-Agile-Business-Analysis
https://www.planittesting.com/Training/Certified-Agile-Tester
https://www.planittesting.com/Training/Business-Analysis
https://www.planittesting.com/Training/BCS-Business-Analysis-Foundation
https://www.planittesting.com/Training/BCS-Requirements-Engineering-Certificate
https://www.planittesting.com/Training/BCS-Modelling-Business-Processes
https://www.planittesting.com/Training/BCS-Business-Analysis-Practice
https://www.planittesting.com/Training/Classroom
https://www.planittesting.com/Training/Virtual-Learning
https://www.planittesting.com/Training/Schedule
https://www.planittesting.com/uk/Insights
https://www.planittesting.com/uk/About
https://www.planittesting.com/uk/Join-Our-Team
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/Services
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Planit-Testing-Index
https://www.planittesting.com/Training/ISTQB-Foundation-Agile-Tester-Extension
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Functional-Testing
https://www.planittesting.com/Services/Test-Automation
https://www.planittesting.com/Services/Performance-Testing
https://www.planittesting.com/Services/Accessibility-Testing
https://www.planittesting.com/Services/Security-Testing
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Services/Digital-Testing
https://www.planittesting.com/Services/Agile-Testing
https://www.planittesting.com/Services/Non-Agile-Testing
https://www.planittesting.com/Services/Test-Strategy
https://www.planittesting.com/Services/Test-Management
https://www.planittesting.com/Services/Process-Improvement
https://www.planittesting.com/Services/DevOps-Solutions
https://www.planittesting.com/Services/Application-Monitoring-Solutions
https://www.planittesting.com/Services/Performance-Testing-Solutions
https://www.planittesting.com/Services/Test-Management-as-a-Service
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Tools-Licensing
https://www.planittesting.com/Services
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/Agile
https://www.planittesting.com/Training/Business-Analysis
https://www.planittesting.com/Training
https://www.planittesting.com/Insights/Cricket-Australia-Case-Study
https://www.planittesting.com/Insights/Lend-Lease-Case-Study
https://www.planittesting.com/Insights/Panviva-Case-Study
https://www.planittesting.com/Contact
https://www.planittesting.com/
https://www.linkedin.com/company/planit-software-testing
https://www.linkedin.com/grp/home?gid=4561841
mailto:infouk@planittesting.com
https://www.planittesting.com/uk/Services
https://www.planittesting.com/uk/Services/Functional-Testing
https://www.planittesting.com/uk/Services/Test-Automation
https://www.planittesting.com/uk/Services/Performance-Testing
https://www.planittesting.com/uk/Services/Accessibility-Testing
https://www.planittesting.com/uk/Tools
https://www.planittesting.com/uk/Tools/Service-Virtualisation
https://www.planittesting.com/uk/Tools/Application-Monitoring
https://www.planittesting.com/uk/Tools/Performance-Testing-Solutions
https://www.planittesting.com/uk/Tools/Test-Management-as-a-Service
https://www.planittesting.com/uk/Training
https://www.planittesting.com/uk/Training/Software-Testing
https://www.planittesting.com/uk/Training/Business-Analysis
https://www.planittesting.com/uk/Training/Agile
https://www.planittesting.com/uk/Training/Full-Course-Schedule
https://www.planittesting.com/uk/About
https://www.planittesting.com/uk/About/Planit-Testing-Index
https://www.planittesting.com/uk/About/Jobs-Board
https://www.planittesting.com/uk/About/Careers-at-Planit
https://www.planittesting.com/uk/About/Bootcamp
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/uk/Contact/Office-1
https://www.planittesting.com/uk/Contact/Office-2
https://www.planittesting.com/uk/Contact/Office-3
https://www.planittesting.com/uk/Contact/Office-4
https://www.planittesting.com/uk/Footer-Navigation/Privacy
https://www.planittesting.com/uk/Footer-Navigation/Terms-Conditions

答案 1 :(得分:0)

您可以微调原始的CSS选择器:

a[href]:not([href~=(?i)^(javascript|tel|mailto)])

描述

a[href]                               /* Select any anchor with an href attribute ... */
:not(                                 /* not starting... */
 [href~=(?i)^(javascript|tel|mailto)] /* with javascript, tel or mail */
)

演示