下载String返回乱码

时间:2018-07-26 17:39:35

标签: java android

我正在尝试将网页源代码的文本作为字符串进行解析。结果是网站的html格式有些含糊,但文字却毫无意义。我是在教程中这样做的,而讲师给出的源代码也给了我同样的问题。对于我尝试的每个站点,它也会持续存在。我的计算机/互联网连接可能有问题吗?

记录结果:

07-26 17:29:49.143 10863-10863/org.andrewedgar.downloadwebcontent I/Result: !otp tl
<-[fl E7>   hm ls=n-sl-e ti8l-e"ln=" !edf-><-[fI ]     hm ls=n-sl-e ti8 ag"><[ni]-
!-i E8>    <tlcas"oj ti9 ag"><[ni]-
!-i tI ]<-><tlcas"oj"ln=e" !-!edf->  <ed
    mt hre=uf8>    <eanm=vepr"cnet"it=eiewdh nta-cl="
    mt ae"ecito"cnet"omi  ooulaplnigpg hr nld wsm adn aedms"
    mt ae"uhr otn=Wwhmz>
    tteZpyoe/il>
    ln e=sotu cn ye"mg/-cn rf"sai/m/aio.n"
    <- otAeoeCS->    <ikrl"tlset rf"sai/s/otaeoemncs>    <- hmf cn S -
    ln e=syehe"he=/ttccsteiyioscs>    <- lgn otIosCS->    <ikrl"tlset rf"sai/s/lgn-otioscs>    <- lgn ieIosCS->    <ikrl"tlset rf"sai/s/lgn-ieioscs>    <- otta S -
    ln e=syehe"he=/ttccsbosrpmncs>    <- lcnvCS->    <ikrl"tlset rf"sai/s/lcnvmncs>    <- nmt S -
    ln e=syehe"he=/ttccsaiaemncs>   <- eoo S -
    ln e=syehe"he=/ttccsvnbxvnbxcs> <- W-aoslCS->    <ikrl"tlset rf"sai/s/w.aoslcs> <- anCS->    <ikrl"tlset rf"sai/s/ancs> <- epnieCS->    <ikrl"tlset rf"sai/s/epniecs>
    srp r=/ttcj/edrmdrir283rsod142mnj"<srp>  <ha>  <oydt-p=srl"dt-agt"nveu aaofe=7"
    !-i tI ]
      pcas"rweugae>o r sn n<togotae<srn>bosr lae< rf"tp/boshpycm"ugaeyu rwe<a oipoeyu xeine<p
    !edf->
    dvi=peodr 
      dvcas'odr 
        dvcas"atr"<dv
      /i>    <dv<- rlae -
    <edri=hae"cas"edrscin>      <i ls=cnanr>        <a ls=nva"
          ahe=# ls=nva-rn"<m d"rnLg"sc"sai/m/apCdLgWtTx.n"at"apcd"<a
          dvcas"-lxmn-rp>            dvi=nveu ls=mimn"
             <lcas"a"
                l < aasrl ls=nvln cie rf"hm"Hm sa ls=s-ny>cret<sa>/>/i
              /l
           <dv
            dvcas"eubn>              < rf"tp:/er.apcd.o"cas"utn1>er<a
            /i>          <dv
        /a>      <dv
    /edr !-Hae -
    <eto d"oe ls=hr_eto rdat1pdig>    <i ls=dslytbe>        <i ls=tbecl"
          dvcas"otie"
            dvcas"eocnet>             <1Lancd h<rfnwy/1
              pPormigdenthv ob oigtdosadfutan.b>oehv oefnadlanhwt oe<p
              ahe=hts/lanzpyoecm ls=bto_"LanNw/>            <dv
          /i>
        /i>   <dv
    /eto>!-Hr eto -
    <- QeyLb->  <citsc"sai/svno/qey11..i.s>/cit
    !-BosrpJ -
    srp r=/ttcj/edrbosrpmnj"<srp>   <- ehrJ -
    srp r=/ttcj/edrtte.i.s>/cit
    !-wyonsj -
    srp r=/ttcj/edrjur.apit.203mnj"<srp>    <- lcnvJ -
    srp r=/ttcj/edrjur.lcnvmnj"<srp>    <- W-aoslJ -
    srp r=/ttcj/edrolcrue.i.s>/cit
    !-CutrpJ -
    srp r=/ttcj/edrjur.oneu.i.s>/cit
    !-Sot colJ -
    srp r=/ttcj/edrsot-colmnj"<srp> <- edrJ -
    srp r=/ttcj/edrvnbxmnj"<srp>    <- jxhm S-> <citsc"sai/svno/qeyaacipmnj"<srp>   <- o S->    <citsc"sai/svno/o.i.s>/cit
    !-Mi S->    <citsc"sai/smi.s>/cit
  <bd><hm>

代码:

   public class DownloadTask extends AsyncTask<String, Void, String> {

    @Override
    protected String doInBackground(String... urls) {

        String result = "";
        URL url;
        HttpURLConnection urlConnection = null;

        try {
            url = new URL(urls[0]);
            urlConnection = (HttpURLConnection) url.openConnection();
            InputStream in = urlConnection.getInputStream();
            InputStreamReader reader = new InputStreamReader(in);
            int data = reader.read();

            while (data != -1) {
                data = reader.read();
                char current = (char) data;
                result += current;
                data = reader.read();
            }
            return result;



        } catch (Exception e) {
            e.printStackTrace();
            return "Failed";
        }

    }
}

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    DownloadTask task = new DownloadTask();

    String result = null;


    try {
        result = task.execute("http://www.zappycode.com").get();

    } catch (Exception e) {

        e.printStackTrace();
    }

    Log.i("Result", result);
}
}

2 个答案:

答案 0 :(得分:1)

您每次迭代从流中读取两次:

while (data != -1) {
  data = reader.read();  // <<- here
  char current = (char) data;
  result += current;
  data = reader.read(); // <<- and here
}

但是仅将结果追加一次。因此,您最终只会得到奇数字符。 这样的事情应该起作用:

while((int data = reader.read) != -1) result += (char) data

但是,总的来说,从输入中读取原始字节并将其转换为字符不是一个好主意。这样的东西会更健壮:

BufferedReader br = new BufferedReader(reader)
StringBuilder accumulator = new StringBuilder()
while((String line = br.readLine()) != null) accumulator
  .append(line)
  .append(System.lineSeparator)

答案 1 :(得分:0)

看来您的代码正在读取原始的8位ASCII字符并显示它们。该网站可能使用不同的字符编码(请参见this Wikipedia article on encoding)。而不是逐字节读取,而是使用缓冲的读取器并使Java将一系列编码后的字节转换为String。 @xtratic指出了StackOverflow上的另一个答案,该答案的代码示例将在此处工作:How to read an http input stream