我正在运行wget --recursive --no-parent --adjust-extension --convert-links --page-requisites --restrict-file-names=windows --keep-session-cookies --load-cookies cookies.txt http://DOMAIN/private/
并正确下载private/index.html
文件。
我检查了这个文件,只有成功的身份验证才会显示正确的页面。它包含如下标记:
<ul><li><a class="CP___PAGEID_56400" href="http://DOMAIN/private/page1.html">My private page</a></li>...
然而,在获取所有资源(图像等)后,似乎认为它已经完成并在“转换链接”后关闭。
如果我跳过--no-parent
它继续前进。那么--no-parent标志是否会让wget变得混乱?
答案 0 :(得分:0)
终于意识到wget正在服从 robots.txt !我将命令改为public class WebSecurityConfig extends WebSecurityConfigurerAdapter {
@Override
protected void configure(HttpSecurity http) throws Exception {
http
.formLogin()
.withObjectPostProcessor(new ObjectPostProcessor<UsernamePasswordAuthenticationFilter>() {
@Override
public <O extends UsernamePasswordAuthenticationFilter> O postProcess(
O filter) {
AntPathRequestMatcher pathMatcher = new AntPathRequestMatcher("/login", "POST");
RequestMatcher noQuery = new RequestMatcher() {
@Override
public boolean matches(HttpServletRequest request) {
return request.getQueryString() == null;
}
};
AndRequestMatcher matcher = new AndRequestMatcher(Arrays.asList(pathMatcher, noQuery));
filter.setRequiresAuthenticationRequestMatcher(matcher);
return filter;
}
})
.and()
...
}
}
并使其正常工作。我添加了 - 等待0.25 因为我不想破坏服务器。