Question

我正在尝试制作一个简单的应用程序，以从本地公交公司的网站上抓取公交时间（它们没有API，这是一个个人项目）。

我过去曾经使用REST Easy进行JSON和XML响应，但是如何返回HTML响应的反序列化表示形式？

目前，我已经可以像这样返回一个Response对象。

@GET
@Path("thebuscompany.com/{town}-{routeNumber}.htm")
@Produces({MediaType.TEXT_HTML})
@Consumes(MediaType.TEXT_HTML)
Response getGlassdoorCompany(@PathParam("town")String company, @PathParam("routeNumber") Integer length);

但是，如果我可以创建一个专用于此资源的响应对象，那就太好了。

如何使REST易于使用Jackson（objectmapper）将HTML反序列化为对象？

Answer 1

注意：大多数市政和城市政府Web-Portals不提供简单的“ JSON”输出API，因此-如果HTML是获取所需信息的唯一方法，则HTML搜索包将是是必需的... Java“ Jackson” API对于大多数“公交与地铁时刻表” Web-Portal来说都不会起到任何作用。（developer.torello.directory/JavaHTML/index.html）

import Torello.HTML.*;
import Torello.HTML.NodeSearch.*;

import java.io.*;
import java.util.*;
import java.net.*;

// Scrape Schedule-Times from a Local-Munciple Bus Service.  Here, (in Dallas) this would be DART
public class Bus
{
    public static void main(String[] argv) throws IOException
    {
        // Dallas Area Rapid Transit Bus 402 Schedule - West Bound Page URL
        // Please open this URL in a Google Chrome or I.E. to see the HTML, **and** make sure to look at the source.
        URL url = new URL("https://dart.org/schedules/w400ea.htm");

        // Vectorized HTML Page as a java.util.Vector (Class HTMLNode has 2 subclasses,
        // TagNode (HTML Elements) and TextNode (Text on the page)
        Vector<HTMLNode> schedulePage = HTMLPage.getPageTokens(url, false);

        // Clicking "View Source" on the web-page, shows the bus-schedule is inside of an "HTML Table"
        // Element, whose CSS "Class" is equal to "mainTable"
        Vector<Vector<HTMLNode>> schedule = InnerTagGetInclusive.all
                (schedulePage, "div", "class", TextComparitor.EQ_CI_TRM, "mainTable");

        Vector<Vector<String>>  busStopTimes = new Vector<>();
        Vector<String>          busStopNames = new Vector<>();

        for (Vector<HTMLNode> partialTabe : schedule)
        {
            // Please Review the HTML-Scrape Package Routines, this would probably 10 - 20 L.O.C.
            // Furthermore, bus schedules are (hopefully) *obviously* going to be different depending
            // upon the city in which you live.

            // NOTE: THE HTML-TABLES USED ARE NOT EASILY UNDERSTOOD BY A HUMAN
            //       AFTER BEING RENDERED BY AN HTML-PARSER, THE FINAL OUTPUT SCHEDULE
            //       (ON THE WEB-PAGE), ARE EASILY UNDERSTOOD.  THERE ARE SUB-SUB HTML TABLES TO EXTRACT

            // This code will compile, but not do anything yet.  However, if HTML-Data-Scrape is your
            // goal - which your question states it is (to me) - these routines are a "Starter" for you.
        }

        // Print the bus-stop table to terminal output
        for (int i=0; i < busStopTimes.size(); i++)
        {
            for (int j=0; j < busStopTimes.elementAt(i).size(); j++)
                System.out.print(busStopTimes.elementAt(i).elementAt(j) + ' ');
            System.out.println();
        }

    }
}

使用REST Easy返回HTML响应对象？

1 个答案: