我使用迭代器样式的API来解析带有Stax的XML流。
我开发了一个小代码,可以将大型XML文件切割成多个文件。
然后我正确阅读了流程,但在写作时,我得到了带有奇数字符的文件(编码问题)
public static void main(String[] args) throws Exception
{
int offre=0;
int i=0,j=0;
String Data="";
String nom="flux0.xml";
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(new java.io.FileInputStream("CJ.xml"));
FileOutputStream output = new FileOutputStream(nom);
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
XMLEventWriter writer = xmlof.createXMLEventWriter(output);
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
while (reader.hasNext() /*&& j<3000*/)
{
XMLEvent event = (XMLEvent) reader.next();
if (event.isStartElement())
{
if (event.asStartElement().getName().getLocalPart() == "OFFER")
{
offre++;
}
}
if(offre==5000)
{
i++;
nom="flux"+i+".xml";
output = new FileOutputStream(nom);
writer= xmlof.createXMLEventWriter(output);
if (event.getEventType() == event.CHARACTERS)
{
Characters characters = event.asCharacters();
String texte=characters.getData();
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array());
writer.add(eventFactory.createCharacters(Data));
}
else
{
writer.add(event);
}
nom="flux"+i+".xml";
offre=0;
}
else
{
if (event.getEventType() == event.CHARACTERS)
{
Characters characters = event.asCharacters();
String texte=characters.getData();
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array());
writer.add(eventFactory.createCharacters(Data));
}
else
{
writer.add(event);
}
}
writer.flush();
}
答案 0 :(得分:0)
使用此代码,char编码被强制写入您的作者
String outputEncoding = "UTF-8";
FileOutputStream fos = new FileOutputStream(aFile);
OutputStreamWriter osw = new OutputStreamWriter(fos, outputEncoding);
答案 1 :(得分:0)
这个代码块完全没必要吗?
Characters characters = event.asCharacters();
String texte=characters.getData();
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array());
writer.add(eventFactory.createCharacters(Data));
为什么你不能像其他活动一样将事件传递给作家?如果您需要特定编码的文件,那么有一个工厂方法将charset作为参数:
FileOutputStream output = new FileOutputStream(nom);
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
XMLEventWriter writer = xmlof.createXMLEventWriter(output, "utf-8");