使用以下代码查找网站中已损坏的链接。但如果我想找到整个网站,包括内部链接,我该怎么办呢?请有人指教。谢谢
检查网页中已损坏的链接
List<WebElement> links = driver.findElements(By.tagName("a"));
Iterator<WebElement> it = links.iterator();
while(it.hasNext()){
url = it.next().getAttribute("href");
System.out.println(url);
if(url == null || url.isEmpty()){
System.out.println("URL is either not configured for anchor tag or it is empty");
continue;
}
if(!url.startsWith(homePage)){
System.out.println("URL belongs to another domain, skipping it.");
continue;
}
try {
huc = (HttpURLConnection)(new URL(url).openConnection());
huc.setRequestMethod("HEAD");
huc.connect();
respCode = huc.getResponseCode();
if(respCode >= 400){
System.out.println(url+" is a broken link");
}
else{
System.out.println(url+" is a valid link");
}
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
答案 0 :(得分:0)
你的方法很完美。要从href
标记中检索<a>
属性后检查链接的状态,您可以编写一个函数,该函数将接受href
作为参数并打印相关的状态< / em>如下:
检查链接状态的功能:
private void CheckingLink(String linkURL)
{
try {
URL url = new URL(linkURL);
HttpURLConnection httpUrlConnect = (HttpURLConnection) url.openConnection();
httpUrlConnect.setConnectTimeout(5000);
httpUrlConnect.connect();
if (httpUrlConnect.getResponseCode() == 200)
{
System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
}
if (httpUrlConnect.getResponseCode() == 500)
{
System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
}
if (httpUrlConnect.getResponseCode() == 404)
{
System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
}
if (httpUrlConnect.getResponseCode() == 402)
{
System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
}
if (httpUrlConnect.getResponseCode() == httpUrlConnect.HTTP_NOT_FOUND)
{
System.out.println(
linkURL + " - " + httpUrlConnect.getResponseMessage() + " - " + httpUrlConnect.HTTP_NOT_FOUND);
}
} catch (IOException e)
{
System.out.println(e.getMessage());
}
}
调用函数CheckingLink()
:
List<WebElement> elements = driver.findElements(By.tagName("a"));
System.out.println("Number of WebElements on this page : "+elements.size());
for (int i=0;i<elements.size();i++)
{
WebElement ele = elements.get(i);
String url = ele.getAttribute("href");
CheckingLink(url);
}
网址https://in.yahoo.com/?p=us上的执行结果会在控制台上产生以下输出:
Number of WebElements on this page : 105
https://in.yahoo.com/ - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://in.news.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
https://in.finance.yahoo.com/ - OK
https://in.style.yahoo.com/tagged/celebrity - OK
https://in.style.yahoo.com/tagged/movies - OK
https://in.style.yahoo.com/ - OK
https://in.mobile.yahoo.com/ - OK
https://in.yahoo.com/everything/ - OK
https://in.answers.yahoo.com/ - OK
https://in.groups.yahoo.com/ - OK
https://in.messenger.yahoo.com/ - OK
https://in.news.yahoo.com/weather - OK
https://in.yahoo.com/everything/world - OK
https://in.yahoo.com/ - OK
https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK
https://in.yahoo.com/?p=us#mega-bottombar-mail - OK
https://in.yahoo.com/?p=us#Main - OK
https://in.yahoo.com/?p=us#Aside - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://cricket.yahoo.com/ - OK
https://in.news.yahoo.com/ - OK
https://in.finance.yahoo.com/ - OK
https://in.style.yahoo.com/ - OK
https://in.style.yahoo.com/tagged/movies - OK
https://in.style.yahoo.com/tagged/celebrity - OK
http://in.travelinspirations.yahoo.com/ - OK
https://in.yahoo.com/everything/ - OK
https://in.news.yahoo.com/video/32-episode-1-095405056.html - OK
https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK
https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK
https://in.news.yahoo.com/fed-bengaluru-traffic-techie-rides-085447032.html - OK
https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK
https://in.style.yahoo.com/quick-look-actor-plays-race-slideshow-wp-102506088.html - OK
https://in.style.yahoo.com/five-crucial-things-know-blood-103318158.html - OK
https://in.news.yahoo.com/boy-america-contracts-bubonic-plague-113108819.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
unknown protocol: javascript
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK
unknown protocol: javascript
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.search.yahoo.com/search?p=India%20vs%20Afghanistan%202018&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Bajrang%20Dal%20VHP%20CIA&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Shujaat%20Bukhari&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Dhivya%20Suryadevara&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Luxury%20watches&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=FIFA%20World%20Cup%202018&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=UN%20Kashmir%20report&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=AAP%20dharna&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Sanju%20poster&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Race%203&fr=fp-tts&fr2=ps - OK
https://weather.yahoo.com/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
null
null
null
https://cricket.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
no protocol:
https://in.news.yahoo.com/ - OK
https://in.style.yahoo.com/bengalureans-force-bbmp-re-look-bizarre-new-pet-licensing-bye-laws-notwithoutmydog-movement-095558668.html - OK
https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK
https://in.news.yahoo.com/photos-football-frenzy-grips-russia-slideshow-wp-085232287.html - OK
https://policies.yahoo.com/in/en/yahoo/privacy/index.htm - OK
http://in.advertising.yahoo.com/ - OK
careers.yahoo.com
https://in.help.yahoo.com/kb/helpcentral - OK
https://yahoo.uservoice.com/forums/206294-india-homepage - OK
PASSED: getLinks
===============================================
Default test
Tests run: 1, Failures: 0, Skips: 0
===============================================
===============================================
Default suite
Total tests run: 1, Failures: 0, Skips: 0
===============================================