crawler4j打印出极大的系统输出堆栈

时间:2013-11-20 11:36:10

标签: crawler4j

我开始使用Crawler4j并在一段时间内使用BasicCrawler示例。 我删除了BasicCrawler.visit()方法的所有输出。 然后我添加了一些我已经拥有的URL处理。 当我现在启动程序时,它会突然打印出我真正不需要的大量内部处理信息。 见下面的例子

Auth cache not set in the context
Target auth state: UNCHALLENGED
Proxy auth state: UNCHALLENGED
Attempt 1 to execute request
Sending request: GET /section.aspx?cat=7 HTTP/1.1
"GET /section.aspx?cat=7 HTTP/1.1[\r][\n]"
>> "Accept-Encoding: gzip[\r][\n]"
>> "Host: www.dailytech.com[\r][\n]"
>> "Connection: Keep-Alive[\r][\n]"
>> "User-Agent: crawler4j (http://code.google.com/p/crawler4j/)[\r][\n]"
>> "Cookie: DTLASTVISITED=11/20/2013 6:16:52 AM; DTLASTVISITEDSYS=11/20/2013 6:16:48 AM;     MF2=vaxc1b832fex; dtusession=dcef3fc0-dc04-4f13-8028-186aea942c3f[\r][\n]"
>> "[\r][\n]"
>> GET /section.aspx?cat=7 HTTP/1.1
>> Accept-Encoding: gzip
>> Host: www.dailytech.com
>> Connection: Keep-Alive
>> User-Agent: crawler4j (http://code.google.com/p/crawler4j/)
>> Cookie: DTLASTVISITED=11/20/2013 6:16:52 AM; DTLASTVISITEDSYS=11/20/2013 6:16:48 AM;     MF2=vaxc1b832fex; dtusession=dcef3fc0-dc04-4f13-8028-186aea942c3f
<< "HTTP/1.1 200 OK[\r][\n]"
<< "Cache-Control: private[\r][\n]"
<< "Content-Type: text/html; charset=utf-8[\r][\n]"
<< "Content-Encoding: gzip[\r][\n]"
<< "Vary: Accept-Encoding[\r][\n]"
<< "Server: Microsoft-IIS/7.5[\r][\n]"
<< "X-AspNet-Version: 4.0.30319[\r][\n]"
<< "Set-Cookie: DTLASTVISITED=11/20/2013 6:16:54 AM; domain=dailytech.com; expires=Tue,     20-Nov-2018 11:16:54 GMT; path=/[\r][\n]"
<< "Set-Cookie: DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; domain=dailytech.com; path=/[\r][\n]"
<< "X-UA-Compatible: IE=EmulateIE7[\r][\n]"
<< "Date: Wed, 20 Nov 2013 11:16:54 GMT[\r][\n]"
<< "Content-Length: 8235[\r][\n]"
<< "[\r][\n]"
Receiving response: HTTP/1.1 200 OK
<< HTTP/1.1 200 OK
<< Cache-Control: private
<< Content-Type: text/html; charset=utf-8
<< Content-Encoding: gzip
<< Vary: Accept-Encoding
<< Server: Microsoft-IIS/7.5
<< X-AspNet-Version: 4.0.30319
<< Set-Cookie: DTLASTVISITED=11/20/2013 6:16:54 AM; domain=dailytech.com;
expires=Tue,20-Nov-2018 11:16:54 GMT; path=/
<< Set-Cookie: DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; domain=dailytech.com; path=/
<< X-UA-Compatible: IE=EmulateIE7
<< Date: Wed, 20 Nov 2013 11:16:54 GMT
<< Content-Length: 8235
Cookie accepted: "[version: 0][name: DTLASTVISITED][value: 11/20/2013 6:16:5
AM][domain:dailytech.com][path: /][expiry: Tue Nov 20 12:16:54 CET 2018]".
Cookie accepted: "[version: 0][name: DTLASTVISITEDSYS][value: 11/20/2013 6:16:48
AM][domain: dailytech.com][path: /][expiry: null]". 
Connection can be kept alive indefinitely
<< "[0x1f]"
<< "[0x8b]"
<< "[0x8]"
<< "[0x0]"
<< "[0x0][0x0][0x0][0x0][0x4][0x0]"
<< "[0xed][0xbd][0x7]`[0x1c]I[0x96]%&/m[0xca]{J[0xf5]J[0xd7][0xe0]t[0xa1]
[0x8][0x80]`[0x13]$[0xd8][0x90]@[0x10][0xec][0xc1][0x88][0xcd][0xe6][0x92][0xec]
[0x1d]iG#)[0xab]*[0x81][0xca]eVe]f[0x16]@[0xcc][0xed][0x9d][0xbc][0xf7][0xde]{[0xef]
[0xbd][0xf7][0xde]{[0xef][0xbd][0xf7][0xba];[0x9d]N'[0xf7][0xdf][0xff]?\fd[0x1]l[0xf6]
[0xce]J[0xda][0xc9][0x9e]![0x80][0xaa][0xc8][0x1f]?~|[0x1f]?"~[0xe3][0xe4]7N[0x1e]
[0xff][0xae]O[0xbf]<y[0xf3][0xfb][0xbc]<M[0xe7][0xed][0xa2]L_~[0xf5][0xe4][0xf9]

有没有办法禁用所有输出?或者有谁知道是什么原因造成的? 这可能是我应该向社区发布的一个错误吗?

感谢您的时间

1 个答案:

答案 0 :(得分:3)

我找到了问题的答案。 我将方法名称从main(string [] args)更改为crawl()。 然后crawler4j开始打印ort调试的东西。 当我改变我的logger4j.properties时,它们就消失了。