Question

我正在尝试从XML读取并将数据存储在文本文件中。我的代码在读取和存储数据方面非常有效，除非XML文件中的段落包含双引号。

例如：

    <Agent> "The famous spy" James Bond </Agent>

输出将忽略带引号的任何数据，结果将是：James Bond

我正在使用SAX，这是我的代码中可能存在问题的一部分：

 public void characters(char[] ch, int start, int length) throws SAXException 
  { 
        tempVal = new String(ch, start, length); 
  }

我想我应该在将字符串存储到tempVal之前替换引号。

任何想法???

以下是完整的代码，以防万一：

public class Entailment {

  private String Text;

  private String Hypothesis;

  private String ID;

  private String Entailment;

}

//Event Handlers
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    //reset
    tempVal = "";
    if(qName.equalsIgnoreCase("pair")) {
        //create a new instance of Entailment
        tempEntailment = new Entailment();
        tempEntailment.setID(attributes.getValue("id"));
        tempEntailment.setEntailment(attributes.getValue("entailment"));
    }
}

public void characters(char[] ch, int start, int length) throws SAXException {
    tempVal = new String(ch, start, length);
}

public void endElement(String uri, String localName, String qName) throws SAXException {
    if(qName.equalsIgnoreCase("pair")) {
        //add it to the list
        Entailments.add(tempEntailment);
    }else if (qName.equalsIgnoreCase("t")) {
        tempEntailment.setText(tempVal);
    }else if (qName.equalsIgnoreCase("h")) {
        tempEntailment.setHypothesis(tempVal);
    }
}

public static void main(String[] args){
    XMLtoTXT spe = new XMLtoTXT();
    spe.runExample();
}

Answer 1

您的characters()方法被多次调用，因为解析器将输入视为几个相邻的文本节点。您的代码编写方式（您没有显示）可能只保留最后一个文本节点。

您需要自己累积相邻文本节点的内容。

StringBuilder tempVal = null;

public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    //reset
    tempVal = new StringBuilder();
    ....
}

public void characters(char[] ch, int start, int length) throws SAXException {
    tempVal.append(ch, start, length);
}

public void endElement(String uri, String localName, String qName) throws SAXException {
    String textValue = tempVal.toString();
    ....
    }
}

Answer 2

有趣的是，我模拟了你的情况，我的SAX解析器工作正常。我正在使用jdk 1.6.0_20，这就是我创建解析器的方式：

  // Obtain a new instance of a SAXParserFactory.
  SAXParserFactory factory = SAXParserFactory.newInstance();
  // Specifies that the parser produced by this code will provide support for XML namespaces.
  factory.setNamespaceAware(true);
  // Specifies that the parser produced by this code will validate documents as they are parsed.
  factory.setValidating(true);
  // Creates a new instance of a SAXParser using the currently configured factory parameters.
  saxParser = factory.newSAXParser();

我的XML标题是：

<?xml version="1.0" encoding="iso-8859-1"?>

你呢？

从JAVA中的XML文件读取时的引用问题

2 个答案: