URL.openConnection()不等待页面完全加载

时间:2015-11-23 07:28:30

标签: java html http https web-scraping

我正在尝试从我的大学加载页面,以便我可以直接将成绩提升到我的课程中。我正在尝试使用HttpsURLConnection,但每当我致电URL.openConnection并发送GET请求时,我都会返回一个没有有用信息的加载页面。

如何完全加载页面,以便我能够从实际页面而不是加载页面获取数据?

编辑: 我明白了:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
    <title>The University of Auckland - PeopleSoft Logon....</title>
    <!--<base href="https://iam.auckland.ac.nz" />-->
    <script type="text/javascript">
        function changeInputType(oldElm, // a reference to the input element  iType, // value of the type property: 'text' or 'password'  iValue, // the default value, set to 'password' in the demo  blankValue, // true if the value should be empty, false otherwise  noFocus) {  // set to true if the element should not be given focus  if(!oldElm || !oldElm.parentNode || (iType.length<4) ||     !document.getElementById || !document.createElement) return;  var isMSIE=/*@cc_on!@*/false; //http://dean.edwards.name/weblog/2007/03/sniff/  if(!isMSIE){    var newElm=document.createElement('input');    newElm.type=iType;  } else {    var newElm=document.createElement('span');    newElm.innerHTML='<input type="'+iType+'" name="'+oldElm.name+'">';    newElm=newElm.firstChild;  }  var props=['name','id','className','size','tabIndex','accessKey'];  for(var i=0,l=props.length;i<l;i++){    if(oldElm[props[i]]) newElm[props[i]]=oldElm[props[i]];  }  newElm.onfocus=function(){return function(){    if(this.hasFocus) return;    var newElm=changeInputType(this,'password',iValue,      (this.value.toLowerCase()==iValue.toLowerCase())?true:false);    if(newElm) newElm.hasFocus=true;  }}();  newElm.onblur=function(){return function(){    if(this.hasFocus)    if(this.value=='' || (this.value.toLowerCase()==iValue.toLowerCase())) {      changeInputType(this,'text',iValue,false,true);    }  }}(); // hasFocus is to prevent a loop where onfocus is triggered over and over again  newElm.hasFocus=false;  // some browsers need the value set before the element is added to the page  // while others need it set after  if(!blankValue) newElm.value=iValue;  oldElm.parentNode.replaceChild(newElm,oldElm);  if(!isMSIE && !blankValue) newElm.value=iValue;  if(!noFocus || typeof(noFocus)=='undefined') {    window.tempElm=newElm;    setTimeout("tempElm.hasFocus=true;tempElm.focus();",1);  }  return newElm;}function readCookie(name) {var nameEQ = name + "=";var ca = document.cookie.split(';');for(var i=0;i < ca.length;i++) {  var c = ca[i];  while (c.charAt(0)==' ') c = c.substring(1,c.length);   if (c.indexOf(nameEQ) == 0) {       if (c.substring(nameEQ.length,c.length)!="(null)") {            return c.substring(nameEQ.length,c.length);     }   }}return null;}function createCookie(name,value,days) {if (days) {  var date = new Date();  date.setTime(date.getTime()+(days*24*60*60*1000));  var expires = "; expires="+date.toGMTString();}else var expires = "";document.cookie = name+"="+value+expires+"; path=/";}function createCookieSeconds(name,value,seconds) {if (seconds) {  var date = new Date();  date.setTime(date.getTime()+(seconds*1000));    var expires = "; expires="+date.toGMTString();} else var expires = "";document.cookie = name+"="+value+expires+"; path=/";}function eraseCookie(name) {createCookie(name,"",-1);}// infinite loop detection var loopdetect=readCookie("loopdetect");if (loopdetect) {   loopdetect=parseInt(loopdetect)+1;} else {  loopdetect=0;}createCookieSeconds("loopdetect",loopdetect,15);
    </script>
    <link rel="icon" href="/plugins/uoa-style-0.2/images/favicon.ico" type="image/vnd.microsoft.icon" />
    <link rel="stylesheet" type="text/css" media="screen" href="/plugins/bundle/bundle.css" />
    <!--        <script type="text/javascript" src="/js/bundle/0/bundle.js" ></script> -->
    <meta name="layout" content="transition" />
    <meta name="layout" content="main" />
    <script language="JavaScript" type="text/javascript">
        function signin(form) {
                var docLoc = new String(document.location);
                var iLast = docLoc.lastIndexOf("?&");
                if (docLoc.length == (iLast + 2)) {
                    docLoc = docLoc.substring(0, iLast);
                }
                if (docLoc.indexOf("?cmd=") == -1 && docLoc.indexOf("?") != -1) {
                    if (docLoc.indexOf("&cmd=login") == -1) {
                        var i = docLoc.length - 1;
                        var j = docLoc.lastIndexOf("&");
                        if (j != -1 && i == j) {
                            form.action = docLoc + form.action.substring(form.action.indexOf("?") + 1, form.action.length);
                        } else {
                            form.action = docLoc + "&" + form.action.substring(form.action.indexOf("?") + 1, form.action.length);
                        }
                    } else {
                        form.action = docLoc; // form.action=docLoc.substring(0,docLoc.indexOf("&cmd=login"))+"&cmd=login"+docLoc.substring(docLoc.indexOf("&languageCd="),docLoc.length);         // 2011/09/25 - not sure why the above customisation was commented out, so left it commented out }} else {  form.action=docLoc;} var now=new Date(); form.timezoneOffset.value=now.getTimezoneOffset(); return ;}function setFocus(){try {document.login.userid.focus()}catch (e)    {};return;}function submitAction(form){signin(form);form.action=form.action.replace("errorCode=106","");form.Submit.disabled=true;form.submit();}
    </script>
</head>

<body onload="uoaOnLoad()" style="cursor:progress">
    <div id="uoa_loading" style="visibility:visible;position:absolute;top:0px;left:0px;width:100%;" align="right">
        <div id="main">
            <div id="main-body-left">
                <div id="main-body-right">
                    <div id="main-body">
                        <div id="mastHeader_round">
                            <div class="head">
                                <a href="http://www.auckland.ac.nz" class="logo" title="The University of Auckland" tabindex="-1"><img src="/plugins/uoa-style-0.2/images/logo.png" alt="The University of Auckland" />
                                </a>
                            </div>
                            <div class="clear"> </div>
                        </div>
                        <div id="container" class="main-container">
                            <div class="page-loading">Loading....</div>
                            <div class="clear"> </div>
                        </div>
                        <div id="mastFooter">
                            <p id="copyright"><a href="http://www.auckland.ac.nz/uoa/" target="_self">Copyright &copy; The University of Auckland</a>
                            </p>
                            <p id="footerLogo">
                                <a target="_blank" title="APRU" href="http://apru.nus.edu.sg/" id="apru"> </a>
                                <a target="_blank" title="Universitas" href="http://www.universitas21.com/" id="universitas"> </a>
                            </p>
                            <p id="footerNav"><a href="http://www.auckland.ac.nz/uoa/home/about/the-university/atoz-directory">A to Z Directory</a> | <a href="http://www.auckland.ac.nz/uoa/home/site-map">Site map</a> | <a href="http://www.auckland.ac.nz/uoa/home/accessibility">Accessibility</a> | <a href="http://www.auckland.ac.nz/uoa/home/copyright">Copyright</a> | <a href="http://www.auckland.ac.nz/uoa/home/disclaimer">Disclaimer</a> | <a href="http://www.auckland.ac.nz/uoa/home/privacy">Privacy</a> | <a href="mailto:contactus@auckland.ac.nz">Feedback on this page</a>
                            </p>
                            <div class="clear"> </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div id="uoa_ssobody" style="visibility:hidden;position:absolute;top:0px;left:0px;width:100%">
        <div id="main">
            <div id="main-body-left">
                <div id="main-body-right">
                    <div id="main-body">
                        <div id="mastHeader_round">
                            <div class="head">
                                <a href="http://www.auckland.ac.nz" class="logo" title="The University of Auckland" tabindex="-1"><img src="/plugins/uoa-style-0.2/images/logo.png" alt="The University of Auckland" />
                                </a>
                            </div>
                            <div class="clear"> </div>
                        </div>
                        <div id="container" class="main-container">
                            <div id="login" class="form-dialog">
                                <h1>The University of Auckland</h1>
                                <fieldset class="login">
                                    <legend>System Information</legend>
                                    <!--        <div class="formElement">                   <div class="field textContainer">                   </div>              </div>              <div class="formElement">                   <div class="field textContainer">                   </div>              </div>-->
                                    <div class="login-options">
                                        <div style="width:94%;text-align:left;position:relative;left:3%;">
                                            <p> <font size=2><font color=red><b><br><div id="extra-errors">You may not have access to the system you have attempted to sign in to.</div></b><br></font> </div>
                                        <hr>
                                        <br>
                                        <div style="width:94%;text-align:left;position:relative;left:3%;"> If you are an applicant and want to review the status of your applications or submit a new application, sign in to:
                                            <br> <a href="https://apply.auckland.ac.nz">https://apply.auckland.ac.nz</a>
                                            <br>
                                            <br> Enrolled students should sign in to:
                                            <br> <a href="https://www.student.auckland.ac.nz">https://www.student.auckland.ac.nz</a>
                                            <br>
                                            <br> To update your personal details, visit:
                                            <br> <a href="https://iam.auckland.ac.nz/identity">https://iam.auckland.ac.nz/identity</a>
                                            <br>
                                            <br> If you believe you have received this message in error, please call
                                            <br>0800 61 62 63 (New Zealand only) or +64 9 373 7999.
                                            <br>
                                            <br> </font>
                                        </div>
                                        </font>
                                        </p>
                                    </div>
                                </fieldset>
                            </div>
                            <div id="message" class="form-dialog">
                                <h3>Protect your privacy</h3>
                                <p>Remember to always log out by <a href="https://wiki.auckland.ac.nz/display/IAMHELP/How+to+Single-Log-Out+by+completely+exiting+your+browser">completely exiting your browser</a> when you leave the computer.</p>
                                <p>This will protect your personal information from being accessed by subsequent users.</p>
                            </div>
                            <div class="clear"> </div>
                        </div>
                        <div id="mastFooter">
                            <p id="copyright"><a href="http://www.auckland.ac.nz/uoa/" target="_self">Copyright &copy; The University of Auckland</a>
                            </p>
                            <p id="footerLogo">
                                <a target="_blank" title="APRU" href="http://apru.nus.edu.sg/" id="apru"> </a>
                                <a target="_blank" title="Universitas" href="http://www.universitas21.com/" id="universitas"> </a>
                            </p>
                            <p id="footerNav"><a href="http://www.auckland.ac.nz/uoa/home/about/the-university/atoz-directory">A to Z Directory</a> | <a href="http://www.auckland.ac.nz/uoa/home/site-map">Site map</a> | <a href="http://www.auckland.ac.nz/uoa/home/accessibility">Accessibility</a> | <a href="http://www.auckland.ac.nz/uoa/home/copyright">Copyright</a> | <a href="http://www.auckland.ac.nz/uoa/home/disclaimer">Disclaimer</a> | <a href="http://www.auckland.ac.nz/uoa/home/privacy">Privacy</a> | <a href="mailto:contactus@auckland.ac.nz">Feedback on this page</a>
                            </p>
                            <div class="clear"> </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div id="uoa_vanillabody" style="visibility:hidden;position:absolute;top:0px;left:0px;width:100%">
        <div id="main">
            <div id="main-body-left">
                <div id="main-body-right">
                    <div id="main-body">
                        <div id="mastHeader_round">
                            <div class="head">
                                <a href="http://www.auckland.ac.nz" class="logo" title="The University of Auckland" tabindex="-1"><img src="/plugins/uoa-style-0.2/images/logo.png" alt="The University of Auckland" />
                                </a>
                            </div>
                            <div class="clear"> </div>
                        </div>
                        <div id="container" class="main-container">
                            <div id="login" class="form-dialog">
                                <h1>The University of Auckland</h1>
                                <fieldset class="login">
                                    <legend>PeopleSoft login</legend>
                                    <!--        <div class="formElement">                   <div class="field textContainer">                   </div>              </div>              <div class="formElement">                   <div class="field textContainer">                   </div>              </div>-->
                                    <div class="login-options">
                                        <form action="?cmd=login&languageCd=ENG" method="post" id="login" name="login" autocomplete="off" onsubmit="signin(document.login)">
                                            <p>
                                                <input type="hidden" name="timezoneOffset" value="0">User ID:
                                                <input id="userid" name="userid" type="text" class="pslogineditbox" value="" size="15">
                                                <br>Password:
                                                <input type="hidden" id="pwd" name="pwd" class="pslogineditbox" size="15">
                                                <br>
                                                <input name="Submit" type="submit" class="psloginbutton" value="Sign In" onclick="submitAction(document.login)">
                                                <br>
                                                <br>
                                                <script>
                                                    document.write("<br>To set trace flags, click <a href='" + document.location + "&trace=y'>here</a>");
                                                </script><font color=red><br><b><br></b></font>
                                            </p>
                                    </div>
                                </fieldset>
                            </div>
                            <div id="message" class="form-dialog">
                                <h3>Protect your privacy</h3>
                                <p>Remember to always log out by <a href="https://wiki.auckland.ac.nz/display/IAMHELP/How+to+Single-Log-Out+by+completely+exiting+your+browser">completely exiting your browser</a> when you leave the computer.</p>
                                <p>This will protect your personal information from being accessed by subsequent users.</p>
                            </div>
                            <div class="clear"> </div>
                        </div>
                        <div id="mastFooter">
                            <p id="copyright"><a href="http://www.auckland.ac.nz/uoa/" target="_self">Copyright &copy; The University of Auckland</a>
                            </p>
                            <p id="footerLogo">
                                <a target="_blank" title="APRU" href="http://apru.nus.edu.sg/" id="apru"> </a>
                                <a target="_blank" title="Universitas" href="http://www.universitas21.com/" id="universitas"> </a>
                            </p>
                            <p id="footerNav"><a href="http://www.auckland.ac.nz/uoa/home/about/the-university/atoz-directory">A to Z Directory</a> | <a href="http://www.auckland.ac.nz/uoa/home/site-map">Site map</a> | <a href="http://www.auckland.ac.nz/uoa/home/accessibility">Accessibility</a> | <a href="http://www.auckland.ac.nz/uoa/home/copyright">Copyright</a> | <a href="http://www.auckland.ac.nz/uoa/home/disclaimer">Disclaimer</a> | <a href="http://www.auckland.ac.nz/uoa/home/privacy">Privacy</a> | <a href="mailto:contactus@auckland.ac.nz">Feedback on this page</a>
                            </p>
                            <div class="clear"> </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
        </form>
    </div>
    <script type="text/javascript">
        function uoaOnLoad() {
                var uoa_docLoc = new String(document.location);
                var uoa_logout = ((uoa_docLoc.indexOf("&cmd=logout") != -1) || (uoa_docLoc.indexOf("?cmd=logout") != -1));
                var uoa_login = ((uoa_docLoc.indexOf("&cmd=login") != -1) || (uoa_docLoc.indexOf("?cmd=login") != -1));
                var uoa_nosso = ((uoa_docLoc.indexOf("&sso=n") != -1) || (uoa_docLoc.indexOf("?sso=n") != -1));
                var uoa_query = (uoa_docLoc.indexOf("?") != -1);
                var uoa_slashlast = (uoa_docLoc.charAt(uoa_docLoc.length - 1) == "/");
                if (loopdetect > 5) {
                    uoa_logout = true;
                    uoa_login = false;
                    document.getElementById('extra-errors').innerHTML = "Cookie problem detected, login terminated. <BR><br></font><font color=black>Please close all your browser windows and then try again.";
                } else if (uoa_docLoc.indexOf("cmd=expire") != -1) {
                    document.getElementById('extra-errors').innerHTML = "Your connection has expired.<br><br></font><font color=black>You can try logging in again by <a href='" + uoa_docLoc.replace("cmd=expire", "cmd=login") + "'>clicking here</a>, or if that doesn't work, you will need to close all your browser windows and then try again.";
                    uoa_logout = true;
                    uoa_login = false;
                }
                if (!((uoa_login) || (uoa_logout))) {
                    if (uoa_query) {
                        window.location = uoa_docLoc + "&cmd=login&languageCd=ENG";
                    } else {
                        if (uoa_slashlast) {
                            window.location = uoa_docLoc + "?cmd=login&languageCd=ENG";
                        } else {
                            window.location = uoa_docLoc + "/?cmd=login&languageCd=ENG";
                        }
                    }
                } else {
                    if (!uoa_nosso) {
                        if (uoa_login) {
                            if (("" == "User ID and Password are required.") || ("" == "")) {
                                var uoa_randomnumber = Math.floor(Math.random() * 1000000000001) var uoa_ssouser_cookie = readCookie("uoa_ssouser");
                                eraseCookie("uoa_ssouser");
                                eraseCookie("uoa_ssouser");
                                eraseCookie("uoa_ssouser");
                                if ((uoa_ssouser_cookie == "(null)") || (uoa_ssouser_cookie == null)) {
                                    uoa_ssouser_cookie = "";
                                }
                                if (("" == uoa_ssouser_cookie) || ("" == "")) {
                                    var uoa_ssouser = uoa_ssouser_cookie + "*sso";
                                } else {
                                    var uoa_ssouser = uoa_ssouser_cookie + "*";
                                }
                                document.login.userid.value = uoa_ssouser;
                                document.login.pwd.value = "ssologin" + uoa_randomnumber;
                                submitAction(document.login);
                            } else {
                                document.getElementById("uoa_loading").style.visibility = "hidden";
                                document.getElementById("uoa_ssobody").style.visibility = "visible";
                                document.body.style.cursor = 'default';
                            }
                        } else {
                            if ((uoa_logout) && (document.getElementById('extra-errors').innerHTML == "You may not have access to the system you have attempted to sign in to.")) {
                                document.getElementById('extra-errors').innerHTML = "</font><font color=black>You've been logged out. <br><BR>You can try logging in again by <a href='" + uoa_docLoc.replace("cmd=logout", "cmd=login") + "'>clicking here</a>.";
                            }
                            document.getElementById("uoa_loading").style.visibility = "hidden";
                            document.getElementById("uoa_ssobody").style.visibility = "visible";
                            document.body.style.cursor = 'default';
                        }
                    } else {
                        document.login.action = document.login.action + "&sso=n";
                        document.getElementById("uoa_loading").style.visibility = "hidden";
                        document.getElementById("uoa_ssobody").style.visibility = "hidden";
                        document.getElementById("uoa_vanillabody").style.visibility = "visible"; //     document.login.pwd.type="password"; IE8 doesn't like this      changeInputType(document.getElementById('pwd'),'password','',true,true);     document.body.style.cursor='default';}}}
    </script>
</body>

</html>

编辑2:我的GET方法:

private String GetPageContent(String url) throws Exception {

        URL obj = new URL(url);
        conn = (HttpsURLConnection) obj.openConnection();

        // default is GET
        conn.setRequestMethod("GET");

        conn.setUseCaches(false);

        // act like a browser
        conn.setRequestProperty("User-Agent", USER_AGENT);
        conn.setRequestProperty("Accept",
                "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
        conn.setRequestProperty("Accept-Language", "en-US,en;q=0.5");

        if (cookies != null) {
            for (String cookie : this.cookies) {
                conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]);
            }
        }       

        int responseCode = conn.getResponseCode();
        System.out.println("\nSending 'GET' request to URL : " + url);
        System.out.println("Response Code : " + responseCode);


        BufferedReader in = new BufferedReader(new InputStreamReader(
                conn.getInputStream()));
        String inputLine;
        StringBuffer response = new StringBuffer();

        while ((inputLine = in.readLine()) != null) {
                response.append(inputLine);

        }
        System.out.println(conn.getURL());
        in.close();

        // Get the response cookies
        setCookies(conn.getHeaderFields().get("Set-Cookie"));

        return response.toString();

    }

1 个答案:

答案 0 :(得分:0)

哪个servlet是您的URL引用,PSC或PSP?

屏幕抓取peopleoft的正确链接是PSC,因为PSP充当反向代理,在iFrame中加载内容(psc)。这可能会导致你所看到的。

如果点击psp servlet,您正在进行的HTTP Get请求将在iFrame加载psc内容时响应200。真的,你只需要psc内容,所以直接引用它。