private static String[] getUrlSource2(String site) throws IOException {
List<String> myList = new ArrayList<String>();
URL url = new URL(site);
HttpURLConnection conn = (HttpURLConnection) url.openConnection(); // Cast shouldn't fail
HttpURLConnection.setFollowRedirects(true);
conn.setRequestProperty("Accept-Encoding", "gzip, deflate");
String encoding = conn.getContentEncoding();
InputStream inStr = null;
if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
inStr = new GZIPInputStream(conn.getInputStream());
} else {
inStr = conn.getInputStream();
}
BufferedReader in = new BufferedReader(new InputStreamReader(inStr,"UTF-8"));
String inputLine;
while ((inputLine = in.readLine()) != null)
myList.add(inputLine);
in.close();
String[] arr = myList.toArray(new String[myList.size()]);
return arr;
}
这是我的getSource方法,由于某种原因它只是给我一部分url页面的源代码,我无法弄清楚为什么.. 如果你能提供帮助,我会深深感到沮丧。
例如,如果你运行:
public class Main {
public static void main(String[] args){
try {
String [] A =getUrlSource2("https://www.google.pt/");
for(int i=0;i<A.length;i++){
System.out.print(String.valueOf(i)+" ");
System.out.println(A[i]);
}
}catch(IOException e){
}
}
当你应该获得大约300/400
时,你会获得5行源代码