登录网站并使用Java和HttpUnit获取编辑配置文件配置文件

时间:2018-06-08 13:36:20

标签: java login web-scraping http-unit

我有一项通用任务,即登录网站并检查某个页面是否响应。我从stackoverflow开始,因为我没有自己的网站来测试。我无意刮掉或破解stackoverflow,它只是用于测试。

我对以下代码的问题是:

  1. stackoverflow登录表单上的提交按钮不是提交按钮,而是按钮。当我尝试切换javascript以使button.click();有效时,我得到一个异常" org / mozilla / javascript / NotAFunctionException"。所以我将form.submit();添加到Java代码中。

  2. 但是,当代码加载用户个人资料页面时,我收到404错误。此恕我直言,表示没有cookie添加到HttpUnit对话中。

    private static String tryLoginAndConnect(WebService webService) {
    String errorMessage = null;
    
    
    
    
    HttpUnitOptions.setExceptionsThrownOnScriptError( false );      
    //We now have the Rhino web.js, HTMLControl class is found now 
    HttpUnitOptions.setScriptingEnabled(false); //We need this against org/mozilla/javascript/NotAFunctionException 
    
    WebConversation conversation = new WebConversation();
    
    WebResponse response;
    if(webService.getWebServiceAuthMethod().equalsIgnoreCase("POST")) {
        WebRequest request = new PostMethodWebRequest(webService.getWebServiceAuthUrl());
    
        try {
            response = conversation.getResponse(request);
        } catch (IOException e) {
            e.printStackTrace();
            return e.toString();
        } catch (SAXException e) {
            e.printStackTrace();
            return e.toString();
        }
    
    } else {
        GetMethodWebRequest request = new GetMethodWebRequest(webService.getWebServiceAuthUrl()) {};
    
        try {
            response = conversation.getResponse(request);
        } catch (IOException e) {
            e.printStackTrace();
            return e.toString();
        } catch (SAXException e) {
            e.printStackTrace();
            return e.toString();
        }
    
    }
    
    if(response == null) {
        return "Could not access authentication at " + webService.getWebServiceAuthUrl();
    }
    
    
    
    WebForm form;
    try {
        try { 
            //int formNumber = Integer.parseInt(webService.getFormName());
            logger.info("Get form by id");
            form= response.getFormWithID(webService.getFormName());
        } catch (SAXException e ) {
            logger.info("Get form by name");
            form = response.getFormWithName(webService.getFormName());
        }
    } catch (SAXException e) {
        e.printStackTrace();
        return e.toString();
    }
    
    if(form == null) {
        return "Could not access form at " + webService.getFormName();
    }
    
    form.setParameter( webService.getParameterNameUsername(), webService.getUsername() );
    form.setParameter(  webService.getParameterNamePassword(), webService.getPassword() );
    
    SubmitButton sb = form.getSubmitButtonWithID(webService.getFormSubmitButton());
    if(sb == null) {
        logger.info("Could not access submit button at " + webService.getFormSubmitButton());           
    
        Button b = form.getButtonWithID(webService.getFormSubmitButton());
        if(b == null) {
            logger.info("Could not access button at " + webService.getFormSubmitButton());
            return "Could not access button at " + webService.getFormSubmitButton();
        } else {
    
            try {
                //UNDONE: Javascript must be off for this to not have an Exception 
                //b.click();
                form.submitNoButton();
            } catch (IOException e) {
                e.printStackTrace();
                return e.toString();
            } catch (SAXException e) {
                e.printStackTrace();
                return e.toString();
            }
    
        }
    
    } else {
    
        try {
            response = form.submit( sb );
        } catch (IOException e) {
            e.printStackTrace();
            return e.toString();
        } catch (SAXException e) {
            e.printStackTrace();
            return e.toString();
        }
    
        if(response == null) {
            return "Could not access login";
        }
    
    }
    
    //access the actual page we want to access      
    WebResponse actualResponse = null;
    if(webService.getWebServiceMethod().equalsIgnoreCase("POST")) {
        WebRequest actualRequest = new PostMethodWebRequest(webService.getWebServiceUrl());
    
        try {
            actualResponse = conversation.getResponse(actualRequest);
        } catch (IOException e) {
            e.printStackTrace();
            return e.toString();
        } catch (SAXException e) {
            e.printStackTrace();
            return e.toString();
        }          
    } else {
        GetMethodWebRequest actualRequest = new GetMethodWebRequest(webService.getWebServiceUrl()) {};
        try {
            actualResponse = conversation.getResponse(actualRequest);
        } catch (IOException e) {
            e.printStackTrace();
            return e.toString();
        } catch (SAXException e) {
            e.printStackTrace();
            return e.toString();
        }
    }
    
    if(actualResponse == null) {
        return "Could not access actual page at " + webService.getWebServiceUrl();
    }
    
    TextBlock[] texts;
    try {
        texts = actualResponse.getTextBlocks();
    } catch (SAXException e) {
        e.printStackTrace();
        return e.toString();
    }
    
    errorMessage = "Did not find needle " + webService.getWebServiceNeedle();
    for(int i = 0; i<texts.length; ++i) {
        TextBlock tb = texts[i];
        if(tb.getText().indexOf(webService.getWebServiceNeedle()) >= 0) {
            logger.info("Found needle" + webService.getWebServiceNeedle());
            errorMessage = null;
            break;
        }
    }
    
    return errorMessage;
    }
    
  3. 我添加了一个cookie监听器: 它打印:

        DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - CookieProperties.isDomainMatchingStrict() = false
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - CookieProperties.isPathMatchingStrict() = false
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - Rejected  s = prov
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - s1 = .stackoverflow.com
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - Reason: 3
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - CookieProperties.isDomainMatchingStrict() = false
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - CookieProperties.isPathMatchingStrict() = false
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - Rejected  s = prov
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - s1 = .stackoverflow.com
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - Reason: 3
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - CookieProperties.isDomainMatchingStrict() = false
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - CookieProperties.isPathMatchingStrict() = false
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - Rejected  s = prov
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - s1 = .stackoverflow.com
    DEBUG: com.ratormonitor.app.job.WebServiceMonitorJob - Reason: 3
    

    听众代码:

    CookieProperties.setDomainMatchingStrict(false);
            CookieProperties.setPathMatchingStrict(false);
    
            CookieProperties.addCookieListener(new  CookieListener()
              {
                public void cookieRejected(String s, int i, String s1)
                {
                  logger.debug("CookieProperties.isDomainMatchingStrict() = " + CookieProperties.isDomainMatchingStrict());
                  logger.debug("CookieProperties.isPathMatchingStrict() = " + CookieProperties.isPathMatchingStrict());
                  logger.debug("Rejected  s = " + s);
                  logger.debug("s1 = " + s1);
                  logger.debug("Reason: " + i);
                }
              });  
    

0 个答案:

没有答案