我有以下代码:
public static void main (String args[]) throws IOException
{
String absHref = "";
String urlList = "";
String relHref = "";
Document doc = Jsoup.connect("https://www.planittesting.com").get();
Elements links = doc.select("a[href]");
for (Element link : links)
{
absHref = link.attr("abs:href");
urlList = absHref.toString();
System.out.println(urlList);
但结果却有差距,我错过了什么?我将相对网址转换为绝对网址,但其中一些网页会以空白形式返回。
[
答案 0 :(得分:1)
如果您使用link.attr("href");
,则可以看到这些href
属性不为空,但它们包含其他内容,例如:
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl01$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl02$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl03$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl04$lbChangeSite','')
如果您使用link.attr("abs:href");
,则会看到所有不属于javascript
的网址的空白值。
您可以添加一个简单的检查来修复它:
package com.github.davidepastore.stackoverflow35544869;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
/**
* Stackoverflow 35544869 question.
*
*/
public class App
{
public static void main( String[] args ) throws IOException
{
String absHref = "";
String urlList = "";
String relHref = "";
Document doc = Jsoup.connect("https://www.planittesting.com").get();
Elements links = doc.select("a[href]");
for (Element link : links)
{
absHref = link.attr("abs:href");
if(!absHref.isEmpty()){
urlList = absHref.toString();
System.out.println(urlList);
}
}
}
}
输出:
https://www.planittesting.com/uk/Home#main
https://www.planittesting.com/uk/Home
https://www.planittesting.com/uk/Home
https://www.linkedin.com/company/planit-software-testing
https://www.planittesting.com/uk/Course-Bookings
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/
https://www.planittesting.com/uk/Services
https://www.planittesting.com/Services/Functional-Testing
https://www.planittesting.com/Services/Test-Automation
https://www.planittesting.com/Services/Performance-Testing
https://www.planittesting.com/Services/Accessibility-Testing
https://www.planittesting.com/Services/Security-Testing
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Services/Digital-Testing
https://www.planittesting.com/Services/Agile-Testing
https://www.planittesting.com/Services/Non-Agile-Testing
https://www.planittesting.com/Services/Test-Strategy
https://www.planittesting.com/Services/Test-Management
https://www.planittesting.com/Services/Process-Improvement
https://www.planittesting.com/Services/DevOps-Solutions
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Application-Monitoring-Solutions
https://www.planittesting.com/Services/Test-Management-as-a-Service
https://www.planittesting.com/Services/Performance-Testing-Solutions
https://www.planittesting.com/Services/Tools-Licensing
https://www.planittesting.com/Services/On-site-Testing
https://www.planittesting.com/Services/Off-site-Testing
https://www.planittesting.com/Services/Off-shore-Testing
https://www.planittesting.com/uk/Training
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/ISTQB-Foundation-Certificate
https://www.planittesting.com/Training/ISTQB-Advanced-Test-Analyst
https://www.planittesting.com/Training/ISTQB-Advanced-Test-Manager
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/Agile
https://www.planittesting.com/Training/ISTQB-Foundation-Agile-Tester-Extension
https://www.planittesting.com/Training/Certified-Agile-Essentials
https://www.planittesting.com/Training/Certified-Agile-Business-Analysis
https://www.planittesting.com/Training/Certified-Agile-Tester
https://www.planittesting.com/Training/Business-Analysis
https://www.planittesting.com/Training/BCS-Business-Analysis-Foundation
https://www.planittesting.com/Training/BCS-Requirements-Engineering-Certificate
https://www.planittesting.com/Training/BCS-Modelling-Business-Processes
https://www.planittesting.com/Training/BCS-Business-Analysis-Practice
https://www.planittesting.com/Training/Classroom
https://www.planittesting.com/Training/Virtual-Learning
https://www.planittesting.com/Training/Schedule
https://www.planittesting.com/uk/Insights
https://www.planittesting.com/uk/About
https://www.planittesting.com/uk/Join-Our-Team
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/Services
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Planit-Testing-Index
https://www.planittesting.com/Training/ISTQB-Foundation-Agile-Tester-Extension
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Functional-Testing
https://www.planittesting.com/Services/Test-Automation
https://www.planittesting.com/Services/Performance-Testing
https://www.planittesting.com/Services/Accessibility-Testing
https://www.planittesting.com/Services/Security-Testing
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Services/Digital-Testing
https://www.planittesting.com/Services/Agile-Testing
https://www.planittesting.com/Services/Non-Agile-Testing
https://www.planittesting.com/Services/Test-Strategy
https://www.planittesting.com/Services/Test-Management
https://www.planittesting.com/Services/Process-Improvement
https://www.planittesting.com/Services/DevOps-Solutions
https://www.planittesting.com/Services/Application-Monitoring-Solutions
https://www.planittesting.com/Services/Performance-Testing-Solutions
https://www.planittesting.com/Services/Test-Management-as-a-Service
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Tools-Licensing
https://www.planittesting.com/Services
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/Agile
https://www.planittesting.com/Training/Business-Analysis
https://www.planittesting.com/Training
https://www.planittesting.com/Insights/Cricket-Australia-Case-Study
https://www.planittesting.com/Insights/Lend-Lease-Case-Study
https://www.planittesting.com/Insights/Panviva-Case-Study
https://www.planittesting.com/Contact
https://www.planittesting.com/
https://www.linkedin.com/company/planit-software-testing
https://www.linkedin.com/grp/home?gid=4561841
mailto:infouk@planittesting.com
https://www.planittesting.com/uk/Services
https://www.planittesting.com/uk/Services/Functional-Testing
https://www.planittesting.com/uk/Services/Test-Automation
https://www.planittesting.com/uk/Services/Performance-Testing
https://www.planittesting.com/uk/Services/Accessibility-Testing
https://www.planittesting.com/uk/Tools
https://www.planittesting.com/uk/Tools/Service-Virtualisation
https://www.planittesting.com/uk/Tools/Application-Monitoring
https://www.planittesting.com/uk/Tools/Performance-Testing-Solutions
https://www.planittesting.com/uk/Tools/Test-Management-as-a-Service
https://www.planittesting.com/uk/Training
https://www.planittesting.com/uk/Training/Software-Testing
https://www.planittesting.com/uk/Training/Business-Analysis
https://www.planittesting.com/uk/Training/Agile
https://www.planittesting.com/uk/Training/Full-Course-Schedule
https://www.planittesting.com/uk/About
https://www.planittesting.com/uk/About/Planit-Testing-Index
https://www.planittesting.com/uk/About/Jobs-Board
https://www.planittesting.com/uk/About/Careers-at-Planit
https://www.planittesting.com/uk/About/Bootcamp
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/uk/Contact/Office-1
https://www.planittesting.com/uk/Contact/Office-2
https://www.planittesting.com/uk/Contact/Office-3
https://www.planittesting.com/uk/Contact/Office-4
https://www.planittesting.com/uk/Footer-Navigation/Privacy
https://www.planittesting.com/uk/Footer-Navigation/Terms-Conditions
答案 1 :(得分:0)
您可以微调原始的CSS选择器:
a[href]:not([href~=(?i)^(javascript|tel|mailto)])
a[href] /* Select any anchor with an href attribute ... */
:not( /* not starting... */
[href~=(?i)^(javascript|tel|mailto)] /* with javascript, tel or mail */
)
原始选择器:a[href]
找到 121个链接
微调选择器:a[href]:not([href~=(?i)^(javascript|tel|mailto)])
找到 115个链接