问题XML编码

时间:2011-03-16 15:09:13

标签: java xml

我使用迭代器样式的API来解析带有Stax的XML流。

我开发了一个小代码,可以将大型XML文件切割成多个文件。

然后我正确阅读了流程,但在写作时,我得到了带有奇数字符的文件(编码问题)

public static void main(String[] args) throws Exception
{

        int offre=0;
        int i=0,j=0;
        String Data="";
        String nom="flux0.xml";
        XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(new java.io.FileInputStream("CJ.xml"));
        FileOutputStream output = new FileOutputStream(nom);
        XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
        XMLEventWriter writer = xmlof.createXMLEventWriter(output);
        XMLEventFactory eventFactory = XMLEventFactory.newInstance();
        while (reader.hasNext() /*&& j<3000*/)
        {
            XMLEvent event = (XMLEvent) reader.next();

            if (event.isStartElement())
            {
                if (event.asStartElement().getName().getLocalPart() == "OFFER")
                {
                    offre++;
                }
            }
            if(offre==5000)
            {
                i++;
                nom="flux"+i+".xml";
                output = new FileOutputStream(nom);
                writer= xmlof.createXMLEventWriter(output);


                if (event.getEventType() == event.CHARACTERS)
                {

                    Characters characters = event.asCharacters();
                    String texte=characters.getData();
                    CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
                    Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array());
                    writer.add(eventFactory.createCharacters(Data));
                }
                  else
                  {
                    writer.add(event);
                  }
                nom="flux"+i+".xml";
                offre=0;
            }
              else
              {
                if (event.getEventType() == event.CHARACTERS)
                {
                    Characters characters = event.asCharacters();
                    String texte=characters.getData();
                    CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
                    Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array());
                    writer.add(eventFactory.createCharacters(Data));
                }
                  else
                  {
                    writer.add(event);
                  }
               }
               writer.flush();
        }

2 个答案:

答案 0 :(得分:0)

使用此代码,char编码被强制写入您的作者

    String outputEncoding = "UTF-8";
    FileOutputStream fos = new FileOutputStream(aFile);
    OutputStreamWriter osw = new OutputStreamWriter(fos, outputEncoding);

答案 1 :(得分:0)

这个代码块完全没必要吗?

Characters characters = event.asCharacters();
String texte=characters.getData();
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array());
writer.add(eventFactory.createCharacters(Data));

为什么你不能像其他活动一样将事件传递给作家?如果您需要特定编码的文件,那么有一个工厂方法将charset作为参数:

FileOutputStream output = new FileOutputStream(nom);
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
XMLEventWriter writer = xmlof.createXMLEventWriter(output, "utf-8");