JSoup格式错误的URL异常

时间:2018-04-25 19:50:16

标签: java arraylist jsoup

我正在尝试使用JSoup打开我已存储在名为arrayLinks的ArrayList中的链接列表。当我运行代码在ArrayList中打开一个链接时,我得到一个格式错误的URL异常。但是,如果我采用格式错误的链接并将它们硬编码到应用程序中,我没有错误。我曾尝试使用StringFormatters或UTF-8查看其他几篇文章,但似乎没有任何效果。任何建议将不胜感激。谢谢!

不起作用的代码:

article = Jsoup.connect(arrayLinks.get(i)).get()

错误:

Caused by: java.net.MalformedURLException: no protocol: "https://www.sbnation.com/college-football-recruiting/2014/7/3/5715252/cordell-broadus-recruit-scouting-report-sure-handed-receiver"
at java.base/java.net.URL.<init>(URL.java:627)
at java.base/java.net.URL.<init>(URL.java:523)
at java.base/java.net.URL.<init>(URL.java:470)
at org.jsoup.helper.HttpConnection.url(HttpConnection.java:132)

有效的代码:

article = Jsoup.connect("https://www.sbnation.com/college-football-recruiting/2014/7/3/5715252/cordell-broadus-recruit-scouting-report-sure-handed-receiver").get()

2 个答案:

答案 0 :(得分:2)

这对我来说很有效。

import java.io.IOException;  
import org.jsoup.Jsoup;  
import org.jsoup.nodes.Document;  
import org.jsoup.nodes.Element;
import java.util.ArrayList;

public class WebScraping{  
    public static void main( String[] args ) throws IOException{ 

       ArrayList<String> arrayLinks = new ArrayList<String>();
       arrayLinks.add("https://www.google.com");
       arrayLinks.add("https://www.youtube.com");
       arrayLinks.add("https://www.facebook.com");
       arrayLinks.add("https://www.sbnation.com/college-football-recruiting/2014/7/3/5715252/cordell-broadus-recruit-scouting-report-sure-handed-receiver");

       for(int i=0; i<arrayLinks.size(); i++) {
            Document doc = Jsoup.connect(arrayLinks.get(i)).get();
            System.out.println(doc.title());
           }
    }  
}  

<强>输出

  

Google

     

YouTube

     

Facebook - ??? ?? ?? ???? ?? ????

     

Cordell Broadus   招募侦察报告:当然接收者 - SBNation.com

我认为您没有将 ArrayList定义为字符串类型,这就是您收到格式错误的网址异常的原因。

答案 1 :(得分:2)

你解决了这个问题吗?

看起来问题是引号。 java.net.URL的来源显示,它不会引用任何格式错误的网址:

throw new MalformedURLException("no protocol: "+original);

显然,这会产生您报告的例外情况:

Jsoup.connect("\"https://www.sbnation.com/college-football-recruiting/2014/7/3/5715252/cordell-broadus-recruit-scouting-report-sure-handed-receiver\"").get();

......结果是:

Caused by: java.net.MalformedURLException: no protocol: "https://www.sbnation.com/college-football-recruiting/2014/7/3/5715252/cordell-broadus-recruit-scouting-report-sure-handed-receiver"
    at java.net.URL.<init>(URL.java:586)
    at java.net.URL.<init>(URL.java:483)
    at java.net.URL.<init>(URL.java:432)
    at org.jsoup.helper.HttpConnection.url(HttpConnection.java:76)