我使用JSoup来解析来自包含希伯来语内容的网站的数据。该网站未指定编码,我使用以下代码来解析数据:
private void crawl(String block,String plot){
String baseUrl = getDestinationUrl(block,plot);
System.out.println(baseUrl);
try {
//Document doc = Jsoup.connect(baseUrl).get();
Document doc = Jsoup.parse(new URL(baseUrl).openStream(), "ISO-8859-1", baseUrl);
System.out.println(doc.toString());
} catch (IOException e) {
System.out.println(e.toString());
}
}
其中给出了以下输出:
<table width="580" border="0" cellspacing="1" cellpadding="0" align="center">
<tbody>
<tr align="right">
<td class="TableBigHeader" colspan="2"><a name="General">îéãò úëðåðé/ãó îéãò úëðåðé</a></td>
</tr>
<tr align="right">
<td class="TableTextM"> 6204</td>
<td align="right" class="TableHeader2">âåù:</td>
</tr>
而不是:
<table width="580" border="0" cellspacing="1" cellpadding="0" align="center">
<tbody><tr align="right">
<td class="TableBigHeader" colspan="2"><a name="General">מידע תכנוני/דף מידע תכנוני</a></td>
</tr>
<tr align="right">
<td class="TableTextM"> 6204</td>
<td align="right" class="TableHeader2">גוש:</td>
</tr>
我也尝试过使用:
Document doc = Jsoup.parse(new URL(baseUrl).openStream(), "UTF-8", baseUrl);
而不是
Document doc = Jsoup.parse(new URL(baseUrl).openStream(), "ISO-8859-1", baseUrl);
在这种情况下,输出是:
<table width="580" border="0" cellspacing="1" cellpadding="0" align="center">
<tbody>
<tr align="right">
<td class="TableBigHeader" colspan="2"><a name="General">???? ??????/?? ???? ??????</a></td>
</tr>
<tr align="right">
<td class="TableTextM"> 6204</td>
<td align="right" class="TableHeader2">???:</td>
</tr>
如何以希伯来语获取内容?提前谢谢。