Question

我正在尝试使用Jsoup解析在线词典结果，我对此有基本的了解。我发布了HTML I'm trying to parse。我正试图抓住“棒球”和“beisebol”的字符串，但我缺乏干净的方式来做到这一点。

Answer 1

这有点困难，因为这两个词在源代码中都没有唯一标识符。

但我会这样做：

Document doc = Jsoup.connect("http://myurl.com").get();
String original = doc.select("td[width=140]").get(1).toString() //get td element which has width of 140 and get the second one
String translated = doc.select("td[align=left]").get(1).toString()//get td element which has align left and get the second one

注意：通过抓取访问数据时，网站设计/源代码的微小变化可能会使您的应用程序失效。

Answer 2

这是一个带有estivate的解决方案（这是一个带有Annotation与JSoup兼容的Java DOM解析器）

Document doc = Jsoup.connect("http://myurl.com").get();

EstivateMapper mapper = new EstivateMapper();

Result result = mapper.map(doc, Result.class);

定义Result类如下：

public class Result {

    @Text(select = "td[width=140]", index=1)
    public String original;

    @Text(select = "td[align=left]", index=1)
    public String translated;

}

使用Jsoup解析在线词典结果

2 个答案: