Java JSoup错误的编码问题

时间:2014-08-13 21:45:59

标签: java encoding jsoup

我使用JSoup来解析来自包含希伯来语内容的网站的数据。该网站未指定编码,我使用以下代码来解析数据:

    private void crawl(String block,String plot){
    String baseUrl = getDestinationUrl(block,plot);
    System.out.println(baseUrl);
    try {
        //Document doc = Jsoup.connect(baseUrl).get();
        Document doc = Jsoup.parse(new URL(baseUrl).openStream(), "ISO-8859-1", baseUrl);
        System.out.println(doc.toString());
    } catch (IOException e) {
        System.out.println(e.toString());
    }

}

其中给出了以下输出:

      <table width="580" border="0" cellspacing="1" cellpadding="0" align="center"> 
   <tbody>
    <tr align="right"> 
     <td class="TableBigHeader" colspan="2"><a name="General">&icirc;&eacute;&atilde;&ograve; &uacute;&euml;&eth;&aring;&eth;&eacute;/&atilde;&oacute; &icirc;&eacute;&atilde;&ograve; &uacute;&euml;&eth;&aring;&eth;&eacute;</a></td> 
    </tr> 
    <tr align="right"> 
     <td class="TableTextM"> 6204</td> 
     <td align="right" class="TableHeader2">&acirc;&aring;&ugrave;:</td> 
    </tr> 

而不是:

<table width="580" border="0" cellspacing="1" cellpadding="0" align="center">
  <tbody><tr align="right"> 
   <td class="TableBigHeader" colspan="2"><a name="General">מידע תכנוני/דף מידע תכנוני</a></td>
  </tr>
  <tr align="right"> 
    <td class="TableTextM">  6204</td>
    <td align="right" class="TableHeader2">גוש:</td>
  </tr>

我也尝试过使用:

    Document doc = Jsoup.parse(new URL(baseUrl).openStream(), "UTF-8", baseUrl);

而不是

    Document doc = Jsoup.parse(new URL(baseUrl).openStream(), "ISO-8859-1", baseUrl);

在这种情况下,输出是:

<table width="580" border="0" cellspacing="1" cellpadding="0" align="center"> 
   <tbody>
    <tr align="right"> 
     <td class="TableBigHeader" colspan="2"><a name="General">???? ??????/?? ???? ??????</a></td> 
    </tr> 
    <tr align="right"> 
     <td class="TableTextM"> 6204</td> 
     <td align="right" class="TableHeader2">???:</td> 
    </tr> 

如何以希伯来语获取内容?提前谢谢。

0 个答案:

没有答案