如何从大型XML中获取特定元素的值

时间:2016-12-21 09:58:39

标签: java xml sax

我是WHERE的初学者。我有一个很大的JAVA SAX文件,我想从中提取一些信息。下面是XML文件,我要提取的内容和代码:

摘自XML文件:

XML

我想:

    ...
    <Synset baseConcept="3" id="mizaAj_n2AR">
          <SynsetRelations>
            <SynsetRelation relType="hyponym" targets="TaboE_n2AR"/>
            <SynsetRelation relType="hyponym" targets="TaboE_n2AR"/>
            <SynsetRelation relType="hypernym" targets="ragobap_n4AR"/>
            <SynsetRelation relType="hypernym" targets="ragobap_n4AR"/>
            <SynsetRelation relType="hypernym" targets="Tiybap_Aln~afos_n1AR"/>
            <SynsetRelation relType="hypernym" targets="Tiybap_Aln~afos_n1AR"/>
          </SynsetRelations>
          <MonolingualExternalRefs>
            <MonolingualExternalRef externalReference="04623612-n" externalSystem="PWN30"/>
          </MonolingualExternalRefs>
        </Synset>
        <Synset baseConcept="3" id="ragobap_n4AR">
          <SynsetRelations>
            <SynsetRelation relType="antonym" targets="mizaAj_n2AR"/>
            <SynsetRelation relType="antonym" targets="mizaAj_n2AR"/>
          </SynsetRelations>
          <MonolingualExternalRefs>
            <MonolingualExternalRef externalReference="04624826-n" externalSystem="PWN30"/>
          </MonolingualExternalRefs>
        </Synset>
        <Synset baseConcept="3" id="tasal~uT_n1AR">
          <SynsetRelations>
            <SynsetRelation relType="has_instance" targets="simap_n1AR"/>
            <SynsetRelation relType="is_instance" targets="simap_n1AR"/>
          </SynsetRelations>
          <MonolingualExternalRefs>
            <MonolingualExternalRef externalReference="04625882-n" externalSystem="PWN30"/>
          </MonolingualExternalRefs>
        </Synset>
...

代码(主类和我的处理程序):

hyponym: 2
hypernym: 4
antonym: 2 
has_instance: 1
is_instance:1

1 个答案:

答案 0 :(得分:1)

public Map<String, Integer> countElements(File xmlFile) {

    Map<String, Integer> counts = new HashMap<>();

    try {
        XMLInputFactory inputFactory = XMLInputFactory.newInstance();
        FileInputStream fileInputStream = new FileInputStream(xmlFile);
        XMLStreamReader reader = inputFactory.createXMLStreamReader(fileInputStream);

        while(reader.hasNext()) {
            reader.next();
            if(reader.isStartElement() && reader.getLocalName().equals("SynsetRelation")) {
                String relTypeValue = reader.getAttributeValue("", "relType");

                if(!counts.containsKey(relTypeValue)) {
                    counts.put(relTypeValue, 0);
                }

                counts.put(relTypeValue, counts.get(relTypeValue) + 1);
            }
        }

        fileInputStream.close();
    } catch (XMLStreamException | IOException e) {
        e.printStackTrace();
    }

    return counts;
}

此代码使用Stream阅读器,这意味着它只会在内存中一次加载一个元素。这使得即使对于大文件也很有效。

地图用于跟踪计数。我每次遇到&#34; SynsetRelation&#34; element我首先检查它是否已被计数,然后我增加计数器。

结果是包含每个检测值的计数的地图。

您可以在主要类中使用它:

public class Main {
    public static void main(String[] args) {
        Map<String, Integer> results = countElements(new File("your file location here.xml"));
    }  
}