我正在构建一个使用URLConnection来抓取网页内容的Android应用,但由于某种原因,它会将符号与实际的撇号(')转换为它的ASCII十进制值( '
)
示例:Let's go to the party
变为Let's go to the party
。
我已经尝试将InputStream字符集设置为ASCII,但这没有用。
代码:
String bodyHtml;
URL url = new URL(webPage);
URLConnection urlConnection = url.openConnection();
urlConnection.setRequestProperty("Authorization", "Basic " + authStringEnc);
InputStream is = urlConnection.getInputStream();
InputStreamReader isr = new InputStreamReader(is, "ASCII");
int numCharsRead;
char[] charArray = new char[1024];
StringBuilder sb = new StringBuilder();
while ((numCharsRead = isr.read(charArray)) > 0) {
sb.append(charArray, 0, numCharsRead);
}
/*StringBuffer sb = new StringBuffer();
while ((numCharsRead = isr.read(charArray)) > 0) {
sb.append(charArray, 0, numCharsRead);
}*/
bodyHtml = sb.toString();
答案 0 :(得分:1)
bodyHtml = URLDecoder.decode(bodyHtml);
答案 1 :(得分:0)
您需要将收到的字符串转换为html并返回字符串。 Html.fromHtml(value)会将收到的值转换为html。调用.toString()将返回字符串(没有任何html标记)
// import this package
import android.text.Html;
从网址收到内容后,您可以将该内容转换为可读形式..
String value ="Let's go to the party";
String formattedValue = Html.fromHtml(value).toString().trim();