在java中使用jsoup提取数据

时间:2014-03-24 18:47:21

标签: java web-scraping jsoup

我正在尝试运行此代码,我正在我的程序中面对" Null Pointer Exception" 。我使用try和catch但我不知道如何消除问题。 这是代码:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.*;
import java.io.*;
import java.lang.NullPointerException;
public class WikiScraper  {

public static void main(String[] args) throws IOException
{
scrapeTopic("/wiki/Python");
}
public static void scrapeTopic(String url){
String html = getUrl("http://www.wikipedia.org/"+url);
Document doc = Jsoup.parse(html);

    String contentText = doc.select("#mw-content-text>p").first().text();
    System.out.println(contentText);
    System.out.println("The url was malformed!");
}
public static String getUrl(String url){
URL urlObj = null;
try{
urlObj = new URL(url);
}
catch(MalformedURLException e){
System.out.println("The url was malformed!");
return "";
}
URLConnection urlCon = null;
BufferedReader in = null;
String outputText = "";
try{
urlCon = urlObj.openConnection();
in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
String line = "";
while((line = in.readLine()) != null){
outputText += line;
}
in.close();
}catch(IOException e){
System.out.println("There was an error connecting to the URL");
return "";
}
return outputText;
}
}

显示的错误是:

There was an error connecting to the URL
Exception in thread "main" java.lang.NullPointerException
    at hello.WikiScraper.scrapeTopic(WikiScraper.java:17)
    at hello.WikiScraper.main(WikiScraper.java:11)

1 个答案:

答案 0 :(得分:1)

你有

public static String getUrl(String url){
    // ...
    return "";
}

始终以空字符串结尾。

尝试

Document doc = Jsoup.connect("http://example.com/").get();

例如。