从xml中提取.jsp格式的内容

时间:2011-11-11 19:28:54

标签: java xml

monitorUrl-- http://host03:8810/solr/admin/stats.jsp,其中包含此xml文件。

    <?xml-stylesheet type="text/xsl" href="stats.xsl"?>
<solr>

      <core></core> 

  <schema>test</schema>
  <host>domain.host.com</host>
  <now>Fri Nov 11 11:14:01 PST 2011</now>
  <start>Thu Sep 22 18:33:06 PDT 2011</start>
  <solr-info>

    <CORE>

    <entry>
      <name>
        core
      </name>
      <class>

      </class>
      <version>
        1.0
      </version>
      <description>
        SolrCore
      </description>
      <stats>

        <stat name="coreName" >

        </stat>

        <stat name="startTime" >
          Thu Sep 22 18:33:06 PDT 2011
        </stat>

        <stat name="refCount" >
          2
        </stat>

        <stat name="aliases" >
          []
        </stat>

      </stats>
    </entry>


    <entry>
      <name>
        searcher
      </name>
      <class>
        org.apache.solr.search.SolrIndexSearcher
      </class>
      <version>
        1.0
      </version>
      <description>
        index searcher
      </description>
      <stats>

        <stat name="searcherName" >
          Searcher@5b637a2d main
        </stat>

        <stat name="caching" >
          true
        </stat>

        <stat name="numDocs" >
          111959
        </stat>

        <stat name="maxDoc" >
          112310
        </stat>

        <stat name="reader" >
          DirectoryReader(segments_h0 _1zn:Cv101710/351 _1zl:Cv8026 _1zp:Cv2574)
        </stat>

        <stat name="readerDir" >
          org.apache.lucene.store.NIOFSDirectory@/es_idx_prd/projects/index/solr-agile/document/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@2c164804
        </stat>

        <stat name="indexVersion" >
          1313979005459
        </stat>

        <stat name="openedAt" >
          Fri Nov 11 11:00:04 PST 2011
        </stat>

        <stat name="registeredAt" >
          Fri Nov 11 11:00:04 PST 2011
        </stat>

        <stat name="warmupTime" >
          0
        </stat>

      </stats>
    </entry>

     </solr-info>
</solr>

我想从上面的xml中提取 numDocs 值111959--               111959              下面的fetchlog方法只是读取该jsp文件的每一行。那么如何通过逐行读取来直接获取numDocs值来检索numDocs值。

monitorUrl是一个xml格式的.jsp文件。

public void fetchlog() {
        InputStream is = null;
        FileOutputStream fos = null;
        try {
            is = HttpUtil.getFile(monitorUrl);
            BufferedReader in   = 
                new BufferedReader (new InputStreamReader (is));
            String line;
            while ((line = in.readLine()) != null) {
                if(line.contains("numDocs")) {
            //Extract numDocs value- How to do this?        
                }
                System.out.println(line);
            }

            fos = new FileOutputStream(buildTargetPath());
            IOUtils.copy(is, fos);
        } catch (FileNotFoundException e) {
            log.error("File Exception in fetching monitor logs :" + e);
        } catch (IOException e) {
            log.error("Exception in fetching monitor logs :" + e);
        }
    }

2 个答案:

答案 0 :(得分:1)

您可以使用Dom4J(或任何XML)和XPATH:

import java.io.IOException;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;

public class XPathExample {

  public static void main(String[] args) 
   throws ParserConfigurationException, SAXException, 
          IOException, XPathExpressionException {

    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    domFactory.setNamespaceAware(true); // never forget this!
    DocumentBuilder builder = domFactory.newDocumentBuilder();
    Document doc = builder.parse("books.xml");

    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    XPathExpression expr 
     = xpath.compile("//numDocs");

    Object result = expr.evaluate(doc, XPathConstants.NODESET);
    NodeList nodes = (NodeList) result;
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i).getNodeValue()); 
    }
  }
}

http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html

答案 1 :(得分:1)

要跟进我的评论,如果XML /文本结构相同,那么你可以这样做,

 if(line.contains("numDocs")) {
     //Extract numDocs value- How to do this?
     String numDocs = in.readLine(); // May need trimming.
     System.out.println("Num docs:" + numDocs);   
 }