我正在尝试制作一个简单的应用程序,以从本地公交公司的网站上抓取公交时间(它们没有API,这是一个个人项目)。
我过去曾经使用REST Easy进行JSON和XML响应,但是如何返回HTML响应的反序列化表示形式?
目前,我已经可以像这样返回一个Response对象。
@GET
@Path("thebuscompany.com/{town}-{routeNumber}.htm")
@Produces({MediaType.TEXT_HTML})
@Consumes(MediaType.TEXT_HTML)
Response getGlassdoorCompany(@PathParam("town")String company, @PathParam("routeNumber") Integer length);
但是,如果我可以创建一个专用于此资源的响应对象,那就太好了。
如何使REST易于使用Jackson(objectmapper)将HTML反序列化为对象?
答案 0 :(得分:0)
注意:大多数市政和城市政府Web-Portals不提供简单的“ JSON”输出API,因此-如果HTML是获取所需信息的唯一方法,则HTML搜索包将是是必需的... Java“ Jackson” API对于大多数“公交与地铁时刻表” Web-Portal来说都不会起到任何作用。 (developer.torello.directory/JavaHTML/index.html)
import Torello.HTML.*;
import Torello.HTML.NodeSearch.*;
import java.io.*;
import java.util.*;
import java.net.*;
// Scrape Schedule-Times from a Local-Munciple Bus Service. Here, (in Dallas) this would be DART
public class Bus
{
public static void main(String[] argv) throws IOException
{
// Dallas Area Rapid Transit Bus 402 Schedule - West Bound Page URL
// Please open this URL in a Google Chrome or I.E. to see the HTML, **and** make sure to look at the source.
URL url = new URL("https://dart.org/schedules/w400ea.htm");
// Vectorized HTML Page as a java.util.Vector (Class HTMLNode has 2 subclasses,
// TagNode (HTML Elements) and TextNode (Text on the page)
Vector<HTMLNode> schedulePage = HTMLPage.getPageTokens(url, false);
// Clicking "View Source" on the web-page, shows the bus-schedule is inside of an "HTML Table"
// Element, whose CSS "Class" is equal to "mainTable"
Vector<Vector<HTMLNode>> schedule = InnerTagGetInclusive.all
(schedulePage, "div", "class", TextComparitor.EQ_CI_TRM, "mainTable");
Vector<Vector<String>> busStopTimes = new Vector<>();
Vector<String> busStopNames = new Vector<>();
for (Vector<HTMLNode> partialTabe : schedule)
{
// Please Review the HTML-Scrape Package Routines, this would probably 10 - 20 L.O.C.
// Furthermore, bus schedules are (hopefully) *obviously* going to be different depending
// upon the city in which you live.
// NOTE: THE HTML-TABLES USED ARE NOT EASILY UNDERSTOOD BY A HUMAN
// AFTER BEING RENDERED BY AN HTML-PARSER, THE FINAL OUTPUT SCHEDULE
// (ON THE WEB-PAGE), ARE EASILY UNDERSTOOD. THERE ARE SUB-SUB HTML TABLES TO EXTRACT
// This code will compile, but not do anything yet. However, if HTML-Data-Scrape is your
// goal - which your question states it is (to me) - these routines are a "Starter" for you.
}
// Print the bus-stop table to terminal output
for (int i=0; i < busStopTimes.size(); i++)
{
for (int j=0; j < busStopTimes.elementAt(i).size(); j++)
System.out.print(busStopTimes.elementAt(i).elementAt(j) + ' ');
System.out.println();
}
}
}