我如何阅读和定位我需要在jsoup中输入登录信息的位置才能访问VPN上的网络? 我对所涉及的步骤/主题的解释以及使用java的编程方法感兴趣(基本上如何使用jsoup在java中编写代码)。 注意:对于所有重定向,我很难理解在jsoup-login中发生了什么以及如何/何时/在哪里编码。
到目前为止,这是我的工作流程:
我有一个目标页面,如下面的
[debug] status code 302 : https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/home/
相应的标题:
{Server=Server, Date=Fri, 02 Mar 2018 04:36:49 GMT, Content-Type=text/html; charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, x-REQUESTNAME-id-1=ZZZAAA3YYYBBB9CCC999, x-frame-options=SAMEORIGIN, x-REQUESTNAME-id-2=123aaaWww1111iiiiix7777yyyzzzqqqhhhiiiE/wPUx/IaHiw6hfs7Y7/Gwa1X0, Location=https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/signin/gi-signin.html/123-1234567-1234567?ie=UTF8&landat=%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121, Vary=Accept-Encoding,User-Agent, Content-Encoding=gzip, Set-cookie=session-id-scsus=123-1234567-1234567; path=/; domain=.landingnetwork.com; expires=Tue, 01-Jan-2036 00:00:01 GMT}
当我在java / jsoup中导航到这个URL时,我得到了各种重定向。这是我的重定向的踪迹:(接下来的顺序)
[debug] status code 302 : https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/signin/gi-signin.html/123-1234567-1234567?ie=UTF8&landat=%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121
相应的标题:
{Server=Server, Date=Fri, 02 Mar 2018 04:36:49 GMT, Content-Type=text/html; charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, x-REQUESTNAME-id-1=ZZZAAA3YYYBBB9CCC999, x-frame-options=SAMEORIGIN, x-REQUESTNAME-id-2=123aaaWww1111iiiiix7777yyyzzzqqqhhhiiiEwwPwwwIaHiw6hfs7Y7vvva1X0, Location=https://wa.secureallnetwork.com/login?clienteId=Centrale-prod-wa&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https%3A%2F%2Fcentrale.landingnetwork.com%3A443%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fsignin%2Fgi-landat.html%2F123-1234567-1234567%3Flandat%3D%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121, Vary=Accept-Encoding,User-Agent, Content-Encoding=gzip, Set-cookie=session-id-scsus=123-1234567-1234567; path=/; domain=.landingnetwork.com; expires=Tue, 01-Jan-2036 00:00:01 GMT}
链接路径中的下一个链接:
[debug] status code 200 : https://wa.secureallnetwork.com/login?clienteId=Centrale-prod-wa&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https%3A%2F%2Fcentrale.landingnetwork.com%3A443%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fsignin%2Fgi-landat.html%2F123-1234567-1234567%3Flandat%3D%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121
相应的标题:
{Server=Server, Date=Fri, 02 Mar 2018 04:36:50 GMT, Content-Type=text/html;charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=31536000; includeSubdomains; preload, Content-Language=en-US, Content-Encoding=gzip, Vary=Accept-Encoding,User-Agent, Set-Cookie=session-id=123-1234567-1234567; Domain=.landingnetwork.com; Expires=Tue, 01-Jan-2036 08:00:01 GMT; Path=/}
链接路径中的下一个链接:
[debug] status code 200 : https://wa.secureallnetwork.com/login?sif_profile=gi_profile_1&clienteId=Centrale-prod-wa&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https://centrale.landingnetwork.com:443/gp/stores/www.landingnetwork.com/gp/signin/gi-landat.html/123-1234567-1234567?landat=/gp/stores/www.landingnetwork.com/gp/home/123-1234567-1234567
相应的标题:
{Server=Server, Date=Fri, 02 Mar 2018 04:36:50 GMT, Content-Type=text/html;charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=31536000; includeSubdomains; preload, Content-Language=en-US, Content-Encoding=gzip, Vary=Accept-Encoding,User-Agent, Set-Cookie=ubid-main=123-1234567-1234567; Domain=.landingnetwork.com; Expires=Tue, 01-Jan-2036 08:00:01 GMT; Path=/}
编辑:确定nonce值不需要是显示的确切值,我将它们编辑为通用值。 &rrt=
和&ort=
值的相同构思。 (如果这些对我的任务有重要意义,请解释)。
edit2:在每个链接的相应标题中添加。
edit2:此外,这是登录表单action=
值的值。
/login?sif_profile=gi_profile_1&clientId=Centrale-prod-na&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https://centrale.landingnetwork.com:443/gp/stores/www.landingnetwork.com/gp/signin/gi-landat.html/123-1234567-1234567?landat=/gp/stores/www.landingnetwork.com/gp/home/123-1234567-1234567
现在,我在网络方面没有超级大背景,但如果解释得很好/彻底,我肯定可以跟进。
我的问题:当我浏览重定向时,我不知道为什么我的用户名/密码的表单发布代码不起作用。
edit3:以下是导航期间的请求标头信息,如Chrome网络标签中所示:
POST /gp/stores/www.landingnetwork.com/gp/handlers/remote-view.html HTTP/1.1
Content-Length: 6013
Accept: */*
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011
POST /gp/stores/www.landingnetwork.com/gp/telephony/handlers/get-due-followup HTTP/1.1
Content-Length: 373
Accept: */*
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011
GET /taw/static/connect-csm.js?_=hereis13numbers HTTP/1.1
Accept: text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011
GET /taw/static/secureall-conduit.js?_=hereis13numbers HTTP/1.1
Accept: text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011
GET /taw/get-csm-parameters HTTP/1.1
Accept: application/json, text/javascript, */*; q=0.01
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011
到目前为止,这是我的代码,(两个类);
import java.io.IOException;
import java.net.SocketException;
import java.util.HashMap;
import org.jsoup.Connection;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;
import org.jsoup.UncheckedIOException;
import org.jsoup.nodes.Document;
public class App {
public static final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0";
public static final String LOGIN_FORM_URL = "https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/home/";
public static final String USERNAME = "myusername";
public static final String PASSWORD = "mupassword";
public static void main(String[] args) throws Exception {
WebCrawler wc = new WebCrawler();
// # Go to login page and grab cookies sent by server
Connection.Response loginForm = wc.crawl(LOGIN_FORM_URL);
// this is the document containing response html
Document loginDoc = loginForm.parse();
// save the cookies to be passed on to next request
HashMap<String, String> cookies = new HashMap<>(loginForm.cookies());
// # Prepare login credentials
String authToken = loginDoc.select("form").attr("class", "a-spacing-micro").first().attr("action");
HashMap<String, String> formData = new HashMap<>();
formData.put("usernameInputField", USERNAME);
formData.put("passwordInputField", PASSWORD);
Connection.Response homePage = wc.crawl("https://wa.secureallnetwork.com" + authToken, cookies, formData, Connection.Method.POST, true);
}
}
import java.io.IOException;
import java.util.HashMap;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.Connection.Response;
public class WebCrawler {
public static final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0";
public Connection.Response crawl(String URL) throws IOException {
Response response = Jsoup.connect(URL).userAgent(USER_AGENT).followRedirects(false).execute();
if (response.hasHeader("location")) {
String redirectUrl = response.header("location");
return crawl(redirectUrl);
} else {
return response;
}
}
public Connection.Response crawl(String URL, HashMap<String, String> cooks, HashMap<String, String> dat, Connection.Method m, boolean follow) throws IOException {
Response response = Jsoup.connect(URL).userAgent(USER_AGENT).cookies(cooks).data(dat).followRedirects(follow).method(m).execute();
if (response.hasHeader("location")) {
String redirectUrl = response.header("location");
return crawl(redirectUrl);
} else {
return response;
}
}
}
当我打印出标题时,它们看起来相对简单,唯一可能突出的是'X-REQUEST-ID1'
/ 'X-REQUEST-ID2'
标题,设置Cookie会话ID和位置。但我确信这不是我遇到主要问题的地方 - 我认为更多的是我尝试通过jsoup与多个网页上的数据进行交互。
重申我的问题:如何使用java / jsoup以实用方式登录我的网站?如果有人愿意花时间,那么详细/例子/最终代码的详尽解释将是一个光荣的教训!
干杯