使用Jsoup将值拆分为键和值

时间:2014-01-27 10:34:49

标签: java jsoup

我需要以密钥和值格式提取数据。我已经解决了如何使用Jsoup.please指导我将此html拆分为键和值

 <div class="text">
        Flamingnet Student Book Reviewer  LGen080812
        <br/>
             Aine (pronounced Ah-nee) has always thought of herself as a normal teenager in Depression-Era Alabama. With her blind brother Spenser, she lives in her grandmother's farmhouse, playing in the woods, reading books, and going to school--but never going outside the farmhouse other than school. But on the one day that their grandmother lets them go to town, Aine and Spenser return only to find that the person they call grandmother isn't actually their grandmother at all, and that she's been murdered. Not to mention that they've actually been living in a book all along. Soon Aine and Spenser are on the run from their grandma's killer, Biblos, with the legendary Gilgamesh. The two siblings hop from novel to novel as they embark on a quest to find three objects to save their world.
        <p>There were a number of things that hindered me from enjoying The Toadhouse Trilogy: Book One. First of all, the narrative was told in the present tense and in the third person. This is not automatically a bad thing for a book, but the style of the prose constructed really awkward sentences at times, eg. "The size of her failure feels epic." The premise for the book felt incredibly uninspired at times; I truly had a lot of trouble getting into and finishing the book. We also never find out what book Aine and Spenser were living in. Speaking of Aine and Spenser, I thought that their characters could be developed a bit better. However, Jess Lourey did do a wonderful job creating Gilgameshs character and using her vivid imagery. The Toadhouse Trilogy: Book One is not a bad book, but I wouldnt recommend it to my friends. </p>
        <p/>
        <p>Reviewer Age:13</p>
    </div>

预期输出:

  

“id”:“LGen080812”“text”:“Aine(发音为Ah-nee)一直以为   她自己是大萧条时代的正常少年 - 阿拉巴马州。和她在一起   盲人兄弟斯宾塞,她住在祖母的农舍里,   在树林里玩耍,看书,上学 - 但从来没有   走出农场以外的农舍。但就在那一天   他们的祖母让他们去镇上,Aine和Spenser只返回   发现他们称之为祖母的人实际上不是他们的   祖母,她被谋杀了。更不用说了   他们实际上一直生活在一本书中。很快Aine和   斯宾塞正在与他们的奶奶杀手Biblos一同奔跑   传奇的吉尔伽美什。这两个兄弟姐妹从小说到新颖   开始寻找三个物体来拯救他们的世界。那里   有很多事情阻碍了我享受The Toadhouse   三部曲:第一册。首先,叙述是在现在告诉的   紧张和第三人称。这不是一件坏事   对于一本书,但散文的风格真的很尴尬   有时句子,例如。 “她失败的大小感觉很史诗。”该   这本书的前提有时令人难以置信;我真的有   进入和完成这本书很麻烦。我们也从来没有   找出Aine和Spenser所生活的书。说到Aine   和斯宾塞,我认为他们的角色可能会有所发展   更好。然而,Jess Lourey确实创造了出色的工作   Gilgameshs角色并使用她生动的图像。 Toadhouse   三部曲:第一册不是一本坏书,但我不推荐给我   朋友。“”评分“:”13“

我的代码:

String html = response.body();
Document document = Jsoup.parse(html);
String review = document.select("div[class=text]").last().text();
System.out.println(review);

1 个答案:

答案 0 :(得分:0)

您的评论中的页面似乎使用JavaScript动态填充其内容,而Jsoup无法执行JavaScript。但是从您的代码示例中,您似乎可以使用

获取此站点的生成内容
String html = response.body();

所以我将专注于那一部分。如果您确定review中的文字始终为格式

Flamingnet Student Book Reviewer [id] [long text] Reviewer Age:[rating]

然后你可以使用regular expressions。您可以尝试使用以下代码:

String html = response.body();
Document document = Jsoup.parse(html);
String review = document.select("div[class=text]").last().text();

Pattern p = Pattern
        .compile(
                "Flamingnet Student Book Reviewer\\s+(?<id>\\w+)\\s+(?<text>.+)\\s+Reviewer Age:(?<rating>\\d+)",
                Pattern.DOTALL);
Matcher m = p.matcher(review);
if (m.matches()){
    System.out.println("id = "+m.group("id"));
    System.out.println("text = "+m.group("text"));
    System.out.println("rating = "+m.group("rating"));
}else{
    System.out.println("Pattern doesn't match. Incorrect data.");
}

您应该看到与

类似的内容
id = LGen080812
text = Aine (pronounced Ah-nee) has always thought of herself as a normal teenager in Depression-Era Alabama. With her blind brother Spenser, she lives in her grandmother's farmhouse, playing in the woods, reading books, and going to school--but never going outside the farmhouse other than school. But on the one day that their grandmother lets them go to town, Aine and Spenser return only to find that the person they call grandmother isn't actually their grandmother at all, and that she's been murdered. Not to mention that they've actually been living in a book all along. Soon Aine and Spenser are on the run from their grandma's killer, Biblos, with the legendary Gilgamesh. The two siblings hop from novel to novel as they embark on a quest to find three objects to save their world. There were a number of things that hindered me from enjoying The Toadhouse Trilogy: Book One. First of all, the narrative was told in the present tense and in the third person. This is not automatically a bad thing for a book, but the style of the prose constructed really awkward sentences at times, eg. "The size of her failure feels epic." The premise for the book felt incredibly uninspired at times; I truly had a lot of trouble getting into and finishing the book. We also never find out what book Aine and Spenser were living in. Speaking of Aine and Spenser, I thought that their characters could be developed a bit better. However, Jess Lourey did do a wonderful job creating Gilgameshs character and using her vivid imagery. The Toadhouse Trilogy: Book One is not a bad book, but I wouldnt recommend it to my friends.
rating = 13

现在您可以使用这些值来创建输出。