Jsoup在本地工作,但不在Heroku中工作

时间:2019-06-15 01:10:17

标签: java maven jsp heroku jsoup

我正在创建一个简单的Java Maven Web应用程序,该应用程序基本上是一个基于Gamespot网站的小型搜索引擎。它使用Jsoup处理Web抓取部分。

我的servlet文件的片段( Search.java ):

    public ArrayList<Game> results = new ArrayList<Game>();

    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        //Search result that the user inputs 
        String s = request.getParameter("searchQuery").trim();

        results.clear();

        search(s);

        //Parsed data will be passed from this servlet to search.jsp to be displayed
        String json = new Gson().toJson(results);
        response.setContentType("/application/json");
        response.setCharacterEncoding("UTF-8");
        response.getWriter().write(json);


    }

    protected void search(String query) {
        if(query.contains(" ")) {
            query.replaceAll(" ", "+");
            if(query.endsWith("+")) {
                query = query.substring(0, query.length() - 1);
            }
        }


        try {
            Document doc = Jsoup.connect("http://www.gamespot.com/search/?i=site&q=" + query.toLowerCase())
                        .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36").get();
            Elements div = doc.select("div.media-img.imgflare--boxart");
            Elements images = div.select("img");
            for(Element image : images) {
                Game g = new Game();
                g.setImage(image.attr("src"));
                g.setName(image.attr("alt"));
                results.add(g);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

在Eclipse中本地运行时,它可以完美运行,并且在控制台中没有任何错误。

但是当我在Heroku中部署此代码时,该页面基本上看起来空白,并且没有显示搜索结果。

打开日志时,我看到此堆栈跟踪:

2019-06-15T00:07:16.508901+00:00 app[web.1]: org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=https://www.gamespot.com/search/?i=site&q=zelda
2019-06-15T00:07:16.508989+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:776)
2019-06-15T00:07:16.509047+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:773)
2019-06-15T00:07:16.509072+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:722)
2019-06-15T00:07:16.509094+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:306)
2019-06-15T00:07:16.509118+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:295)
2019-06-15T00:07:16.509146+00:00 app[web.1]:    at com.GameSearchWebApp.Controller.Search.search(Search.java:94)
2019-06-15T00:07:16.509164+00:00 app[web.1]:    at com.GameSearchWebApp.Controller.Search.doGet(Search.java:59)
2019-06-15T00:07:16.509200+00:00 app[web.1]:    at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
2019-06-15T00:07:16.509221+00:00 app[web.1]:    at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
2019-06-15T00:07:16.509241+00:00 app[web.1]:    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
2019-06-15T00:07:16.509264+00:00 app[web.1]:    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
2019-06-15T00:07:16.509289+00:00 app[web.1]:    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
2019-06-15T00:07:16.509337+00:00 app[web.1]:    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
2019-06-15T00:07:16.509356+00:00 app[web.1]:    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
2019-06-15T00:07:16.509377+00:00 app[web.1]:    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
2019-06-15T00:07:16.509399+00:00 app[web.1]:    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
2019-06-15T00:07:16.509420+00:00 app[web.1]:    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
2019-06-15T00:07:16.509442+00:00 app[web.1]:    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
2019-06-15T00:07:16.509477+00:00 app[web.1]:    at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800)
2019-06-15T00:07:16.509495+00:00 app[web.1]:    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
2019-06-15T00:07:16.509517+00:00 app[web.1]:    at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806)
2019-06-15T00:07:16.509539+00:00 app[web.1]:    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
2019-06-15T00:07:16.509566+00:00 app[web.1]:    at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
2019-06-15T00:07:16.509590+00:00 app[web.1]:    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2019-06-15T00:07:16.509611+00:00 app[web.1]:    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2019-06-15T00:07:16.509632+00:00 app[web.1]:    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
2019-06-15T00:07:16.509654+00:00 app[web.1]:    at java.lang.Thread.run(Thread.java:748)

此行:

2019-06-15T00:07:16.509146+00:00 app[web.1]:    at com.GameSearchWebApp.Controller.Search.search(Search.java:94)

在我的代码中引用此行:

Document doc = Jsoup.connect("http://www.gamespot.com/search/?i=site&q=" + query.toLowerCase())
                        .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36").get();

这是Heroku中的问题,还是我需要在该行代码中添加一些内容才能使其正常运行?我看到解决此403错误的唯一答案是添加User-Agent,但我显然在这里。

0 个答案:

没有答案
相关问题