如何将字符串解析为xml文件。如何像xml一样读取字符串?

时间:2017-06-20 14:51:24

标签: java xml string stanford-nlp

我有这个字符串:

<dependencies style="typed">
  <dep type="dep">
    <governor idx="1">Maria</governor>
    <dependent idx="2">mrge</dependent>
  </dep>
  <dep type="dep">
    <governor idx="2">mrge</governor>
    <dependent idx="3">la</dependent>
  </dep>
  <dep type="dep">
    <governor idx="1">Maria</governor>
    <dependent idx="4">scoala</dependent>
  </dep>
</dependencies>

我尝试通过它但是出现了这样的例外,我不知道如何解决它。

这是错误:

3:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1; Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at versionTwo.Analyze.convertStringToDocument(Analyze.java:348)
    at versionTwo.Analyze.depRel(Analyze.java:299)
    at versionTwo.MainClass.main(MainClass.java:17)
Exception in thread "main" java.lang.NullPointerException
    at versionTwo.Analyze.depRel(Analyze.java:300)

这是我的代码:

    public String depRel(String graph) throws SAXException, IOException,
            ParserConfigurationException {
        String xmlString;
        xmlString = Features.dependencyGraph(graph);
        String result = "";
        System.out.println("A value og dependency graph is;" + xmlString);
        Document document = parseXmlFromString(xmlString);
        document.getDocumentElement().normalize();
        Element root = document.getDocumentElement();
        NodeList nList = document.getElementsByTagName("dependencies");
        for (int temp = 0; temp < nList.getLength(); temp++) {
            Node node = nList.item(temp);
            if (node.getNodeType() == Node.ELEMENT_NODE) {
                // Print each employee's detail
                Element eElement1 = (Element) node;
            }
            NodeList nodesDocPart = node.getChildNodes();
            for (int temp2 = 0; temp2 < nodesDocPart.getLength(); temp2++) {
                Node n = nodesDocPart.item(temp2);
                // /////////////////////////////////////////////////sentence/////////////////////////////////////////////
                NodeList nodesSentencePart = n.getChildNodes();
                for (int temp3 = 0; temp3 < nodesSentencePart.getLength(); temp3++) {
                    Node sentence = nodesSentencePart.item(temp3);
                    if (sentence.getNodeType() == Node.ELEMENT_NODE) {
                        Element eElement4 = (Element) sentence;
                        System.out.println("Sentence : "
                                + eElement4.getTextContent());
                        result = eElement4.getTextContent() + "\n";
                    }
                }
            }
        }
        return result;
    }

    public Document parseXmlFromString(String xmlString)
            throws ParserConfigurationException, SAXException, IOException {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        InputStream inputStream = new ByteArrayInputStream(xmlString.getBytes());
        org.w3c.dom.Document document = builder.parse(inputStream);
        return document;
    }

这是我的方法,在解析一个句子之后从XML创建一个String。这个字符串我想在另一个类中读取,比如xml但我发布的底部错误出现了。任何想法?

public static String dependencyGraph(String s) {
    Properties props = new Properties();
    props.put("annotators",
            "tokenize, ssplit, pos, lemma, ner, parse, dcoref,depparse");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    Annotation document = new Annotation(s);
    pipeline.annotate(document);
    CoreMap sentence = document.get(
            CoreAnnotations.SentencesAnnotation.class).get(0);
    SemanticGraph dependency_graph = sentence
            .get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);

    String newLine = System.getProperty("line.separator");
    //convert the output format to a string

    String graph = "\n\nDependency Graph: "
            + dependency_graph.toString(SemanticGraph.OutputFormat.XML)//save the answer like a String from the xml
            + newLine;
    // System.out.println("The graph was made=>" + graph);
    return graph;

}
public static String dependencyGraph(String s) {
    Properties props = new Properties();
    props.put("annotators",
            "tokenize, ssplit, pos, lemma, ner, parse, dcoref,depparse");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    Annotation document = new Annotation(s);
    pipeline.annotate(document);
    CoreMap sentence = document.get(
            CoreAnnotations.SentencesAnnotation.class).get(0);
    SemanticGraph dependency_graph = sentence
            .get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);

    String newLine = System.getProperty("line.separator");
    //convert the output format to a string

    String graph = "\n\nDependency Graph: "
            + dependency_graph.toString(SemanticGraph.OutputFormat.XML)//save the answer like a String from the xml
            + newLine;
    // System.out.println("The graph was made=>" + graph);
    return graph;

}

1 个答案:

答案 0 :(得分:0)

在dependencyGraph(String)中你做

String graph = "\n\nDependency Graph: "
           + dependency_graph.toString(SemanticGraph.OutputFormat.XML);

创建一个以两个换行符和文本“DependencyGraph”开头的字符串。

然后将其分配给变量:

String xmlString;
        xmlString = Features.dependencyGraph(graph);

然后尝试将其解析为XML:

Document document = parseXmlFromString(xmlString);

但是以两个换行符和文本“Dependency Graph”开头的字符串不是格式良好的XML,因此XML解析器抱怨:在第3行第1列它发现了一些不能成为XML的序言的东西文档。

因此,标题问题的答案是:如果要将字符串解析为XML,则必须包含格式良好的XML。