解析XML文件以获取特定的文本内容

时间:2014-04-29 21:42:54

标签: java xml dom

我正在解析代表研究论文/ artciles的XML文件,并且在XML模式下面存储在Java中的MYSQL数据库中

  <article>
    <article-meta></article-meta>
    <body>
     <p> 
     Extensible Markup Language (XML) is a markup language that defines a set of
     rules for encoding documents in a format that is both human-readable and machine-
     readable <ref id = 1>. It is defined in the XML 1.0 Specification produced by the 
      W3C, and several other related specifications
      </p>
      <p>
       Many application programming interfaces (APIs) have been developed to aid 
      software developers with processing XML <ref id = 2>. data, and several schema 
       systems exist to aid in the definition of XML-based languages.
      </p>
    </body>
    <back>
      <ref-list>
         <ref id = 1>Details about this reference </ref>
         <ref id = 2>Details about this reference </ref>
       </ref-list>
     </back>
   </article>

我正在使用DOM解析器解析文件。其中一个要求是每个 ref id ,我必须从body标签中引用的位置左右提取150个字符。我怎么能这样做?

     refId     leftText    rightText
     1         left 150     150 chars on right side

1 个答案:

答案 0 :(得分:0)

假设您使用dom从代码中获取了<ref>标记元素Id = 1和元素content value = Details about this reference,将<ref> tag内容值存储在字符串变量中,那么您可以使用sub string方法得到左边的char和右边的char。就这样。

String text ="Details about this reference";
String leftText = text.substring(0,7); // get 7 chars from left side
String rightText =text.substring(text.length()-2); // get 2 char from right side, instead of 2 you have to pass10

结果

leftText:Details rightText:ce

注意:在提取之前需要检查字符串长度大于150,如果少于substring则会抛出异常ArayIndexBoundOfException