我在网站上找到了一个断开的链接列表。
public static void main(String[] args) {
String homePage = "http://www.example.com";
String url = "";
HttpURLConnection huc = null;
int respCode = 200;
driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get(homePage);
List<WebElement> links = driver.findElements(By.tagName("a"));
Iterator<WebElement> it = links.iterator();
while(it.hasNext()){
url = it.next().getAttribute("href");
System.out.println(url);
if(url == null || url.isEmpty()){
System.out.println("URL is either not configured for anchor tag or it is empty");
continue;
}
if(!url.startsWith(homePage)){
System.out.println("URL belongs to another domain, skipping it.");
continue;
}
huc = (HttpURLConnection)(new URL(url).openConnection());
huc.setRequestMethod("HEAD");
huc.connect();
respCode = huc.getResponseCode();
if(respCode >= 400){
System.out.println(url+" is a broken link");
}
else{
System.out.println(url+" is a valid link");
}
...
但需要找到父链接。这个损坏的URL位于何处(页面URL)。有可能用selenium +(java + intellij idea)来做吗? 算法(不需要代码),谢谢。