Question

我正在解析代表研究论文/ artciles的XML文件，并且在XML模式下面存储在Java中的MYSQL数据库中

  <article>
    <article-meta></article-meta>
    <body>
     <p> 
     Extensible Markup Language (XML) is a markup language that defines a set of
     rules for encoding documents in a format that is both human-readable and machine-
     readable <ref id = 1>. It is defined in the XML 1.0 Specification produced by the 
      W3C, and several other related specifications
      </p>
      <p>
       Many application programming interfaces (APIs) have been developed to aid 
      software developers with processing XML <ref id = 2>. data, and several schema 
       systems exist to aid in the definition of XML-based languages.
      </p>
    </body>
    <back>
      <ref-list>
         <ref id = 1>Details about this reference </ref>
         <ref id = 2>Details about this reference </ref>
       </ref-list>
     </back>
   </article>

我正在使用DOM解析器解析文件。其中一个要求是每个 ref id ，我必须从body标签中引用的位置左右提取150个字符。我怎么能这样做？

     refId     leftText    rightText
     1         left 150     150 chars on right side

Answer 1

假设您使用dom从代码中获取了<ref>标记元素Id = 1和元素content value = Details about this reference，将<ref> tag内容值存储在字符串变量中，那么您可以使用sub string方法得到左边的char和右边的char。就这样。

String text ="Details about this reference";
String leftText = text.substring(0,7); // get 7 chars from left side
String rightText =text.substring(text.length()-2); // get 2 char from right side, instead of 2 you have to pass10

结果

leftText:Details rightText:ce

注意：在提取之前需要检查字符串长度大于150，如果少于substring则会抛出异常ArayIndexBoundOfException

解析XML文件以获取特定的文本内容

1 个答案: