如何在jsoup html解析

时间:2016-05-14 20:45:20

标签: android jsoup

我有一个这个HTML:

<html>
<head>
<title>Try jsoup</title>
</head>
<body class="sin">
<div class="ks">
    <div class="wrap">

        <div class="mag-right-sidebar-wrap">
            <main class="mag">

                //A lot of unneeded tags

                <article class="post-1989009 post type-post post" itemscope="" itemtype="http://schema.org/CreativeWork">
                    <header class="post-header">
                        <h1 class="post-title" itemprop="headline">Knowledge nay</h1>
                        <img src="https://ohniee.com/wp-mag/uploads/avatars/1/djsy8933e89ufio8389e8-author-img.jpg" class="avatar user-1-avatar avatar-40 photo" width="40" height="40" alt="Profile photo of Johnnie Adams">

                        <div class="flip-meta" style="padding-top:3px; margin-left: 50px">
lorem ipsum <a href="/members/iyke"><span class="flip-author" itemprop="author" itemscope itemtype="http://schema.org/Person"><span class="flip-author-name" itemprop="name"> Johnnie Adams</span></span></a> <script>
document.write(" on June 1st, 2005 00:99 ")</script>  .  <span class="flip-comments-link"><a href="https://ohniee.com/lorem-ipsum">25 Comments</a></span>
</div>
                    </header>

                    //A lot of unneeded tags
</body>
</html>

我试图从2005年6月1日00:99 中提取 lorem ipsum Johnnie Adams。但我得到的是 lorem ipsum Johnnie Adams。 25评论

请问,如何从HTML获得 lorem ipsum Johnnie Adams 2005年6月1日00:99

这是我正在使用的代码

document.select("div.flip-meta").first().text();

Jsoup演示链接:https://try.jsoup.org/~BAit4PmvqNcdVAKLBv4Yp4QrXYQ

1 个答案:

答案 0 :(得分:1)

修改Stephens回答,

Element script = document.select("div.flip-meta script").first();
if (script==null) {
    throw new RuntimeException("script element not found");
}

String scriptContent = script.html().replace("document.write(\"", "").replace("\")", "");

String text1 = document.select("div.flip-meta").first().text();
String text2 = text1.replaceAll("\\s*[.?!].*","");

String finaltext = text2 + scriptContent;

urTextView.setText(finaltext);

这应该会让你 lorem ipsum Johnnie Adams于2005年6月1日00:99