使用api将文本转换为XML

时间:2015-03-09 18:42:59

标签: java xml

我写不出正确的话。我怎么能解决这个问题?我想要这样的东西:

    <text><sentence>
    <word>a</word>
    <word>had</word>
    <word>lamb</word>
    <word>little</word>
    <word>Mary</word>
</sentence>
<sentence>
    <word>Aesop</word>
    <word>and</word>
    <word>called</word>
    <word>came</word>
    <word>for</word>
    <word>Peter</word>
    <word>the</word>
    <word>wolf</word>
</sentence>
<sentence>
    <word>Cinderella</word>
    <word>likes</word>
    <word>shoes</word>
</sentence>

但我有这样的事情:

<text>
    <sentence>
        <word>a</word>
        <word>had</word>
        <word>lamb</word>
        <word>little</word>
        <word>Mary</word>
    </sentence>
</text>

示例文字

“玛丽有一只小羊羔。”

彼得叫狼,而伊索来了。灰姑娘喜欢鞋......“

我的班级

我的班级

public class StaxWriteXmlTest {

    /**
     * @param args
     * @throws FileNotFoundException
     * @throws XMLStreamException
     */
    public static void main(String[] args) throws FileNotFoundException,
            XMLStreamException {
        String[] word = initItems();

        // xml event writer with output stream
        // XMLOutputFactory xmlOutFactory = XMLOutputFactory.newInstance();
        // OutputStream outputStream = new FileOutputStream("D:\\word.xml");
        // XMLEventWriter eventWriter = xmlOutFactory
        // .createXMLEventWriter(outputStream);
        XMLEventWriter eventWriter = XMLOutputFactory.newInstance()
                .createXMLEventWriter(System.out);

        XMLEventFactory eventFactory = XMLEventFactory.newInstance();
        XMLEvent end = createNewLine(eventFactory);
        XMLEvent tab = createTab(eventFactory);

        // Create start tag
        StartDocument startDocument = eventFactory.createStartDocument();
        EndDocument endDocument = eventFactory.createEndDocument();
        eventWriter.add(startDocument);

        // create config open tag
        eventWriter.add(end);
        StartElement configStartElement = eventFactory.createStartElement("",
                "", "text");
        eventWriter.add(configStartElement);
        eventWriter.add(end);

        eventWriter.add(tab);
        StartElement itemStartElement = eventFactory.createStartElement("", "",
                "sentence");
        eventWriter.add(itemStartElement);
        eventWriter.add(end);
        eventWriter.add(tab);

        // add words
        for (String words : word) {
            eventWriter.add(tab);
            createItemNode(eventFactory, eventWriter, "word", words);
            eventWriter.add(tab);
        }
        // eventWriter.add(tab);
        EndElement itemEndElement = eventFactory.createEndElement("", "",
                "sentence");
        eventWriter.add(itemEndElement);
        eventWriter.add(end);

        EndElement configEndElement = eventFactory.createEndElement("", "",
                "text");
        eventWriter.add(configEndElement);
        eventWriter.add(end);

        eventWriter.add(endDocument);
        eventWriter.flush();
        eventWriter.close();

    }

    public static void createItemNode(XMLEventFactory eventFactory,
            XMLEventWriter eventWriter, String elementName, String value)
            throws XMLStreamException {
        XMLEvent end = eventFactory.createDTD("\n");
        StartElement startElement = eventFactory.createStartElement("", "",
                elementName);
        eventWriter.add(startElement);
        Characters characters = eventFactory.createCharacters(value);
        eventWriter.add(characters);
        EndElement endElement = eventFactory.createEndElement("", "",
                elementName);
        eventWriter.add(endElement);
        eventWriter.add(end);
    }

    public static XMLEvent createTab(XMLEventFactory eventFactory) {
        return eventFactory.createDTD("\t");
    }

    public static XMLEvent createNewLine(XMLEventFactory eventFactory) {
        return eventFactory.createDTD("\n");
    }

    public static String[] initItems() {

        FileReader fr = null;

        try {
            fr = new FileReader("text.txt");
        } catch (FileNotFoundException e1) {

            e1.printStackTrace();
        }
        BufferedReader inputText = new BufferedReader(fr);
        String text = "", newText = "";
        String allTogether = "";
        String[] nexSplit = {};
        try {
            while ((text = inputText.readLine()) != null) {
                newText += text.replaceAll("\\s+", " ").replaceAll(" ,", ",")
                        .replaceAll(" \\.", ".").replaceAll("\\..", ".");

                allTogether = newText.replaceAll("\\s+", " ");

            }

            String[] splitText = allTogether.split("[.]");

            for (int i = 0; i < splitText.length; i++) {
                nexSplit = splitText[i].split("[ \t]");
                Arrays.sort(nexSplit, String.CASE_INSENSITIVE_ORDER);
                return nexSplit;

            }

        } catch (IOException e) {
            e.printStackTrace();
        }
        return nexSplit;

    }


}

1 个答案:

答案 0 :(得分:0)

问题出在initItems,由于语句return nexSplit;(在内循环中),在第一句之后返回。您必须将一个句子的已排序单词收集到一个List中,然后返回该列表。我重复了方法initItems中需要更改的行:

 public static List<String[]> initItems() {  // RETURN A LIST
     List<String[]> sents = new ArrayList<>();  // declare new List
     // ...
         for (int i = 0; i < splitText.length; i++) {
              nexSplit = splitText[i].split("[ \t]");
              Arrays.sort(nexSplit, String.CASE_INSENSITIVE_ORDER);
              sents.add( nexSplit );  // APPEND WORDS OF ANOTHER SENTENCE
         }

    return sents;  // RETURN THE LIST OF WORDS-OF-A-SENTENCE
}

当然,主程序现在必须处理List<String[]>