我正在将springboot用于一个项目,并且其中有一个模块,我必须进行抓取以获得网站的一些数据,以收集在postgress后保存的一些静态数据...但是在我的日常工作中我看到了与未知网站的奇怪连接,例如baidu.com,cn.bing.com:443和www.voanews.com:443
我与此关联的是我的剪贴连接,但是我不知道该连接尝试的起源是什么...我怎么知道?
我正在使用gargoyleleware进行抓取,而我的野蝇在http上,没有https ...
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.HttpMethod;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.html.DomElement;
import com.gargoylesoftware.htmlunit.html.DomNodeList;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
这些是我的野蝇中的错误日志...
2019-06-07 02:07:50,557 ERROR [io.undertow.request] (default I/O-1) UT005071: Undertow request failed HttpServerExchange{ CONNECT www.voanews.com:443 request {Proxy-Connection=[Keep-Alive], Proxy-Authorization=[Basic Og==], User-Agent=[PycURL/7.43.0 libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3], Host=[www.voanews.com:443]} response {}}: java.lang.IllegalArgumentException: UT000068: Servlet path match failed
at io.undertow.servlet.handlers.ServletPathMatchesData.getServletHandlerByPath(ServletPathMatchesData.java:83)
at io.undertow.servlet.handlers.ServletPathMatches.getServletHandlerByPath(ServletPathMatches.java:88)
at io.undertow.servlet.handlers.ServletInitialHandler.handleRequest(ServletInitialHandler.java:151)
at io.undertow.server.handlers.HttpContinueReadHandler.handleRequest(HttpContinueReadHandler.java:65)
at io.undertow.server.handlers.PathHandler.handleRequest(PathHandler.java:94)
at org.wildfly.extension.undertow.Host$OptionsHandler.handleRequest(Host.java:386)
at io.undertow.server.handlers.HttpContinueReadHandler.handleRequest(HttpContinueReadHandler.java:65)
at org.wildfly.extension.undertow.Host$AcmeResourceHandler.handleRequest(Host.java:405)
at org.wildfly.extension.undertow.Host$HostRootHandler.handleRequest(Host.java:414)
at io.undertow.server.handlers.NameVirtualHostHandler.handleRequest(NameVirtualHostHandler.java:64)
at io.undertow.server.handlers.error.SimpleErrorPageHandler.handleRequest(SimpleErrorPageHandler.java:78)
at io.undertow.server.handlers.CanonicalPathHandler.handleRequest(CanonicalPathHandler.java:49)
at org.wildfly.extension.undertow.Server$DefaultHostHandler.handleRequest(Server.java:189)
at io.undertow.server.handlers.ChannelUpgradeHandler.handleRequest(ChannelUpgradeHandler.java:211)
at io.undertow.server.protocol.http2.Http2UpgradeHandler.handleRequest(Http2UpgradeHandler.java:102)
at io.undertow.server.handlers.DisallowedMethodsHandler.handleRequest(DisallowedMethodsHandler.java:61)
at io.undertow.server.Connectors.executeRootHandler(Connectors.java:360)
at io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:255)
at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
at io.undertow.server.protocol.http.HttpOpenListener.handleEvent(HttpOpenListener.java:147)
at io.undertow.server.protocol.http.HttpOpenListener.handleEvent(HttpOpenListener.java:93)
at io.undertow.server.protocol.http.HttpOpenListener.handleEvent(HttpOpenListener.java:52)
at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.xnio.ChannelListeners$10.handleEvent(ChannelListeners.java:291)
at org.xnio.ChannelListeners$10.handleEvent(ChannelListeners.java:286)
at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.xnio.nio.QueuedNioTcpServer$1.run(QueuedNioTcpServer.java:131)
at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:612)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:479)