这是我HTML的一部分
<p>hello world </p>
<p><img class=\"aligncenter size-full wp-image-3197\" src=\"data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=\" data-lazy-src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" data-lazy-srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /><noscript><img class=\"aligncenter size-full wp-image-3197\" src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /></noscript></p
<p>goodbye world</p>
如您所见,HTML中有3个<p>
标记。但是我怎样才能在jsoup中定义正常的<p>
标签,如hello world和goodbye world,并忽略{im} class <p>
标签?
到目前为止,这是我的代码:
public class MainActivity extends AppCompatActivity {
public WebView webView;
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main_page);
webView=(WebView)findViewById(R.id.webi);
new AsyncTask<Void, Void, String>() {
@Override
protected String doInBackground(Void... voids) {
String html = "";
try {
Document document = Jsoup.connect("http://memaraneha.ir/%db%8c%da%a9%d9%be%d8%a7%d8%b1%da%86%da%af%db%8c-%d9%87%d9%85%d8%a7%d9%87%d9%86%da%af%db%8c-%d8%b7%d8%b1%d8%a7%d8%ad%db%8c-%d8%af%d8%a7%d8%ae%d9%84%db%8c/")
.timeout(20000).get();
Elements elements=document.select("div.base-box:nth-child(2)").select("p");
html = elements.toString();
} catch (IOException e) {
e.printStackTrace();
}
return html;
}
@Override
protected void onPostExecute(String html) {
String mime = "text/html";
String encoding = "utf-8";
webView.loadDataWithBaseURL(null,html, mime, encoding,null);
}
}.execute();
}
}
答案 0 :(得分:1)
您可以避免循环并使用以下内容:
Elements e = doc.select("p:not(:has(img))");
答案 1 :(得分:0)
你可以尝试这样的事情。
选择所有不在<p>
内嵌<img>
标记的 Document document = Jsoup.connect().get();
Elements elements = new Elements();
for (Element e : document.select("p")) {
if (e.select("img").isEmpty()) {
elements.add(e);
}
}
代码
case 'hello':
sendHelloGenericReponse(senderID);
sendWeatherQuickReplyQuestion(senderID); //only execute this after above is complete
break;