如何访问xml中下一个标记的文本内容?

时间:2017-06-22 11:44:26

标签: java xml

我有以下代码:

    public String depRel() throws SAXException, IOException,
        ParserConfigurationException, ClassNotFoundException,
        ClassCastException {
    String xmlString = Features.dependencyGraph();
    ;

    String result = "";
    String dependent = "";
    String governor = "";
    String type = "";

    // System.out.println("A value is :" + xmlString);
    // aici il convertesc ca sa il pot citi ca si xml
    Document document = convertStringToDocument(xmlString);
    document.getDocumentElement().normalize();
    Element root = document.getDocumentElement();
    NodeList nList = document.getElementsByTagName("dependencies");
    for (int temp = 0; temp < nList.getLength(); temp++) {
        Node node = nList.item(temp);
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element eElement1 = (Element) node;

        }
        NodeList nodesDocPart = node.getChildNodes();
        for (int temp2 = 0; temp2 < nodesDocPart.getLength(); temp2++) {

            Node n = nodesDocPart.item(temp2);

            if (n.getNodeType() == Node.ELEMENT_NODE) {
                Element el1 = (Element) n;
                type = el1.getAttribute("type");
            }

            // /////////////////////////////////////////////////sentence/////////////////////////////////////////////
            NodeList nodesSentencePart = n.getChildNodes();
            for (int temp3 = 0; temp3 < nodesSentencePart.getLength(); temp3++) {
                Node sentence = nodesSentencePart.item(temp3);
                if (sentence.getNodeType() == Node.ELEMENT_NODE) {

                    Element eElement4 = (Element) sentence;
                    if (eElement4.getTagName().equals("dependent")) {
                        dependent = eElement4.getTextContent();
                    }
                    if (eElement4.getTagName().equals("governor")) {
                        governor = eElement4.getTextContent();


enter code here

下一个描述句子依赖图的xml格式。 句子是:在用维甲酸或PMA刺激后,在U937前突细胞系中,在纯化的人单核细胞和巨噬细胞中,产生人类免疫缺陷病毒1型(HIV-1)后代。

<dependencies style="typed">
  <dep type="det">
    <governor idx="2">production</governor>
    <dependent idx="1">The</dependent>
  </dep>
  <dep type="nsubjpass">
    <governor idx="14">followed</governor>
    <dependent idx="2">production</dependent>
  </dep>
  <dep type="case">
    <governor idx="7">type</governor>
    <dependent idx="3">of</dependent>
  </dep>
  <dep type="amod">
    <governor idx="7">type</governor>
    <dependent idx="4">human</dependent>
  </dep>
  <dep type="compound">
    <governor idx="7">type</governor>
    <dependent idx="5">immunodeficiency</dependent>
  </dep>
  <dep type="compound">
    <governor idx="7">type</governor>
    <dependent idx="6">virus</dependent>
  </dep>
  <dep type="nmod:of">
    <governor idx="2">production</governor>
    <dependent idx="7">type</dependent>
  </dep>
  <dep type="nummod">
    <governor idx="7">type</governor>
    <dependent idx="8">1</dependent>
  </dep>
  <dep type="punct">
    <governor idx="10">HIV-1</governor>
    <dependent idx="9">-LRB-</dependent>
  </dep>
  <dep type="appos">
    <governor idx="7">type</governor>
    <dependent idx="10">HIV-1</dependent>
  </dep>
  <dep type="punct">
    <governor idx="10">HIV-1</governor>
    <dependent idx="11">-RRB-</dependent>
  </dep>
  <dep type="dep">
    <governor idx="7">type</governor>
    <dependent idx="12">progeny</dependent>
  </dep>
  <dep type="auxpass">
    <governor idx="14">followed</governor>
    <dependent idx="13">was</dependent>
  </dep>
  <dep type="case">
    <governor idx="20">line</governor>
    <dependent idx="15">in</dependent>
  </dep>
  <dep type="det">
    <governor idx="20">line</governor>
    <dependent idx="16">the</dependent>
  </dep>
  <dep type="compound">
    <governor idx="20">line</governor>
    <dependent idx="17">U937</dependent>
  </dep>
  <dep type="amod">
    <governor idx="20">line</governor>
    <dependent idx="18">promonocytic</dependent>
  </dep>
  <dep type="compound">
    <governor idx="20">line</governor>
    <dependent idx="19">cell</dependent>
  </dep>
  <dep type="nmod:in">
    <governor idx="14">followed</governor>
    <dependent idx="20">line</dependent>
  </dep>
  <dep type="case">
    <governor idx="22">stimulation</governor>
    <dependent idx="21">after</dependent>
  </dep>
  <dep type="nmod:after">
    <governor idx="14">followed</governor>
    <dependent idx="22">stimulation</dependent>
  </dep>
  <dep type="dep">
    <governor idx="26">acid</governor>
    <dependent idx="23">either</dependent>
  </dep>
  <dep type="case">
    <governor idx="26">acid</governor>
    <dependent idx="24">with</dependent>
  </dep>
  <dep type="amod">
    <governor idx="26">acid</governor>
    <dependent idx="25">retinoic</dependent>
  </dep>
  <dep type="nmod:with">
    <governor idx="22">stimulation</governor>
    <dependent idx="26">acid</dependent>
  </dep>
  <dep type="cc">
    <governor idx="26">acid</governor>
    <dependent idx="27">or</dependent>
  </dep>
  <dep type="nmod:with">
    <governor idx="22">stimulation</governor>
    <dependent idx="28">PMA</dependent>
  </dep>
  <dep type="conj:or">
    <governor idx="26">acid</governor>
    <dependent idx="28">PMA</dependent>
  </dep>
  <dep type="punct">
    <governor idx="14">followed</governor>
    <dependent idx="29">,</dependent>
  </dep>
  <dep type="cc">
    <governor idx="14">followed</governor>
    <dependent idx="30">and</dependent>
  </dep>
  <dep type="case">
    <governor idx="34">monocytes</governor>
    <dependent idx="31">in</dependent>
  </dep>
  <dep type="amod">
    <governor idx="34">monocytes</governor>
    <dependent idx="32">purified</dependent>
  </dep>
  <dep type="amod">
    <governor idx="34">monocytes</governor>
    <dependent idx="33">human</dependent>
  </dep>
  <dep type="conj:and">
    <governor idx="14">followed</governor>
    <dependent idx="34">monocytes</dependent>
  </dep>
  <dep type="cc">
    <governor idx="34">monocytes</governor>
    <dependent idx="35">and</dependent>
  </dep>
  <dep type="conj:and">
    <governor idx="14">followed</governor>
    <dependent idx="36">macrophages</dependent>
  </dep>
  <dep type="conj:and">
    <governor idx="34">monocytes</governor>
    <dependent idx="36">macrophages</dependent>
  </dep>
  <dep type="punct">
    <governor idx="14">followed</governor>
    <dependent idx="37">.</dependent>
      </dep>

如果我在标签&#34;州长&#34;我怎样才能访问标签&#34;依赖&#34;?因为我想获得所有州长和所有家属一个字。我怎么能成功?

1 个答案:

答案 0 :(得分:0)

似乎你想要一个governor/dependent/word的集合。 您可以使用以下代码来获取此类的集合 - 我称之为GovernorDependentNode

class GovernorDependentNode
{
    Node governor;
    Node dependent;
    String word;
}

List<GovernorDependentNode> getNodes( String word, InputSource is )
{
    List<GovernorDependentNode> gdNodes = new ArrayList<GovernorDependentNode>();
    try
    {

        Object govs = XPathFactory.newInstance().newXPath().evaluate("//dep/governor[.='" + word + "']", is, XPathConstants.NODESET );
        if ( govs != null )
        {
            NodeList gNodes = (NodeList)govs;
            for ( int i = 0; i < gNodes.getLength(); i++ )
            {
                GovernorDependentNode gdNode = new GovernorDependentNode();
                Node gNode = gNodes.item(i);
                gdNode.governor = gNode;
                gdNode.word = word;
                NodeList childNodes = gNode.getParentNode().getChildNodes();
                for ( int j = 0; j < childNodes.getLength(); j++ )
                {
                    Node n = childNodes.item(j);
                    if ( n.getNodeName().equals( "dependent" ) )
                    {
                        gdNode.dependent = n;
                        break;
                    }
                }
                gdNodes.add( gdNode );

            }
        }
    }
    catch ( Exception e )
    {
        e.printStackTrace();
    }

    return gdNodes;
}

使用如下方法:

InputSource is = new InputSource( new StringReader( xmlString ) );
List<GovernorDependentNode> nodes = getNodes( "yourWord", is );

方法getNodes首先使用xpath:governor获取给定单词的//dep/governor[.='word']个节点。

可能有几个,例如单词following有9个节点,因此应为每个节点获取dependent个节点,并使用信息 - 调控器构建一个类,从属节点和给定的单词。

要打印节点列表,您可以使用:

List<GovernorDependentNode> nodes = getNodes( "followed", inputSource );
for ( GovernorDependentNode node : nodes )
{
        System.out.println( "Word : " + node.word );
        System.out.println( "Governor : " + node.governor.getTextContent() );
        System.out.println( "Dependent : " + node.dependent.getTextContent());

}

输出是:

Word : followed
Governor : followed
Dependent : production
Word : followed
Governor : followed
Dependent : was
Word : followed
Governor : followed
Dependent : line
Word : followed
Governor : followed
Dependent : stimulation
Word : followed
Governor : followed
Dependent : ,
Word : followed
Governor : followed
Dependent : and
Word : followed
Governor : followed
Dependent : monocytes
Word : followed
Governor : followed
Dependent : macrophages
Word : followed
Governor : followed
Dependent : .