如何使用Jsoup检查标题标签(h1-h6)的层次结构顺序

时间:2017-11-08 10:24:41

标签: java jsoup

1.如何在h3标签之前检查h2标签

2.如何在h4标签之前检查h3标签.....就像明智一样 它应该检查整个HTML页面。

这是我的示例代码。

 <html>
 <head></head>
 <body>
 <h1>aaaaaaa</h1>
 <img src = "..." />
 <img src = "..." alt="information about image" />
 <input type="image" src="img_submit.gif"  />

 <h2>bbbbbbbbbbbbbb</h2>
 <input type="text" aria-autocomplete="both" role="textbox">
 <input type="text" aria-autocomplete="both">

  <h3>ccccccccc</h3>
   <div role="checkbox" aria-checked="true"></div>
  <div role="checkbox"></div>

  <div role="slider" aria-orientation="vertical"></div>
  <div role="slider" aria-orientation=""></div>
  <a href="www.google.com" aria-expanded="undefined"> venkatesh </a>
  <a href="www.google.com" aria-expanded="false"> venkatesh </a>
   </body>
   </html>

上面是我的html示例html页面。

public void headingHirarchy(){ 
    try {
         Document doc = Jsoup.parse(input, "UTF-8", "file:///C:/Users/PTGHYD/Desktop/testing.html");

         Elements elements = doc.select("h3");
         for (Element element : elements) {
             Element next = element.previousElementSibling();
             if(next.tagName().startsWith("h2")) {
                 System.out.println("success");
             } else {
                 System.out.println("error");
             }
         }
    } catch (IOException e) {
        e.printStackTrace();
    }

 }

1 个答案:

答案 0 :(得分:0)

我并不完全清楚你想做什么,但我想你想要遍历HTML文档并验证标题元素(h1, h2, h3, h4, h5, h6)是否正常h6 h5之前h5h4 h4之前h3未来{/ 1}}

根据您的评论

更新,现在很清楚,不仅排序很重要,而且序列中没有间隙,即HTML文档是否包含h2然后该元素必须前面加上// this fails fast i.e. as soon as it find an out-of-order header element, if you are interested in gathering and // reporting all out-of-order header elements then you'll want to gather the failures in a collection and then // include the failures in an exception which is thrown **after** you have iterated over all header tag public void validateHeaderElementOrdering(String html) { Document document = Jsoup.parse(html); Elements headerElements = document.select("h1, h2, h3, h4, h5, h6"); List<String> headerTags = new ArrayList<>(); for (Element element : headerElements) { headerTags.add(element.nodeName()); } // if there is only one entry then it must be h1 because (a) every header must be preceded by its parent and // (b) if there is only one header present then this header is not preceded by anything and (c) only h1 has // no parent if (headerTags.size() == 1) { String currentTag = headerTags.get(0); long currentTagPosition = Long.valueOf(currentTag.substring(1, 2)); if (currentTagPosition != 1) { throw new RuntimeException(String.format("Header tags are out of order, there's only one header tag and it is not h1!")); } } // now walk the headerTags and for each entry insist that it is preceded by the immediately prior header, // based on the ordering h6 -> h5 -> h4 -> h3 -> h2 -> h1 for (int i = 1; i < headerTags.size(); i++) { String previousTag = headerTags.get(i - 1); String currentTag = headerTags.get(i); long currentTagPosition = Long.valueOf(currentTag.substring(1, 2)); long previousTagTagPosition = Long.valueOf(previousTag.substring(1, 2)); if(currentTagPosition != (previousTagTagPosition + 1)) { throw new RuntimeException(String.format("Header tags are out of order: %s came after %s", currentTag, previousTag)); } } } ,而@Test public void headerElementsWhichAreInOrderAreValid() { String html = "<html><body><h1></h1><h2></h2><h3></h3><h4></h4><h5></h5><h6></h6></body></html>"; validateHeaderElementOrdering(html); } @Test public void headerElementsWhichAreNotInOrderAreInvalid() { String html = "<html><body><h1></h1><h3></h3><h2></h2><h4></h4><h5></h5><h6></h6></body></html>"; try { validateHeaderElementOrdering(html); Assert.fail("Expected out of order header elements to be deemed invalid!"); } catch (RuntimeException ex) { Assert.assertEquals("Header tags are out of order: h3 came after h1", ex.getMessage()); } } @Test public void headerElementsWhichAreNotCompleteAndInOrderAreInvalid() { String html = "<html><body><h2></h2><h4></h4><h5></h5></body></html>"; try { validateHeaderElementOrdering(html); Assert.fail("Expected out of order header elements to be deemed invalid!"); } catch (RuntimeException ex) { Assert.assertEquals("Header tags are out of order: h4 came after h2", ex.getMessage()); } } @Test public void willValidateEvenWhenThereIsASingleHeaderElement() { String html = "<html><body><h3></h3></body></html>"; try { validateHeaderElementOrdering(html); Assert.fail("Expected out of order header elements to be deemed invalid!"); } catch (RuntimeException ex) { Assert.assertEquals("Header tags are out of order, there's only one header tag and it is not h1!", ex.getMessage()); } } 必须先加data agg1; set agg; if ethnicity = 'WHITE' then ethnicity1= 'white' ; if ethnicity = 'WHITE-RUSSIAN' then ethnicity1= 'white' ; if ethnicity = 'WHITE-EUROPEAN' then ethnicity1= 'white'; . . run ; 等。

如果是,那么以下代码将执行此操作:

<pre>

以下测试用例证明了此代码的行为:

white-space:pre