我有一个这个HTML:
<html>
<head>
<title>Try jsoup</title>
</head>
<body class="sin">
<div class="ks">
<div class="wrap">
<div class="mag-right-sidebar-wrap">
<main class="mag">
//A lot of unneeded tags
<article class="post-1989009 post type-post post" itemscope="" itemtype="http://schema.org/CreativeWork">
<header class="post-header">
<h1 class="post-title" itemprop="headline">Knowledge nay</h1>
<img src="https://ohniee.com/wp-mag/uploads/avatars/1/djsy8933e89ufio8389e8-author-img.jpg" class="avatar user-1-avatar avatar-40 photo" width="40" height="40" alt="Profile photo of Johnnie Adams">
<div class="flip-meta" style="padding-top:3px; margin-left: 50px">
lorem ipsum <a href="/members/iyke"><span class="flip-author" itemprop="author" itemscope itemtype="http://schema.org/Person"><span class="flip-author-name" itemprop="name"> Johnnie Adams</span></span></a> <script>
document.write(" on June 1st, 2005 00:99 ")</script> . <span class="flip-comments-link"><a href="https://ohniee.com/lorem-ipsum">25 Comments</a></span>
</div>
</header>
//A lot of unneeded tags
</body>
</html>
我试图从2005年6月1日00:99 中提取 lorem ipsum Johnnie Adams。但我得到的是 lorem ipsum Johnnie Adams。 25评论。
请问,如何从HTML获得 lorem ipsum Johnnie Adams 2005年6月1日00:99 ?
这是我正在使用的代码
document.select("div.flip-meta").first().text();
Jsoup演示链接:https://try.jsoup.org/~BAit4PmvqNcdVAKLBv4Yp4QrXYQ
答案 0 :(得分:1)
修改Stephens回答,
Element script = document.select("div.flip-meta script").first();
if (script==null) {
throw new RuntimeException("script element not found");
}
String scriptContent = script.html().replace("document.write(\"", "").replace("\")", "");
String text1 = document.select("div.flip-meta").first().text();
String text2 = text1.replaceAll("\\s*[.?!].*","");
String finaltext = text2 + scriptContent;
urTextView.setText(finaltext);
这应该会让你 lorem ipsum Johnnie Adams于2005年6月1日00:99