这是一个延续of another question。当我尝试解析我的xml文件时,我收到此错误。
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 12; Content is not allowed in trailing section.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at convert.ExcelXmlReader.getAndParseFile(ExcelXmlReader.java:55)
at convert.ExcelXmlReader.main(ExcelXmlReader.java:24)
“lineNumber:68; columnNumber:12;”部分与最后一个'>'匹配在我的xml文件中。当我尝试删除它后面的空白区域时,它仍然给我错误。我试图把它扔进xml validator,但它没有提出任何东西。我真的不确定我在做什么。我尝试了其他堆栈溢出问题的其他解决方案(查看我的文件,找到xml文件后的任何奇怪的字符,确保所有标签都关闭)但是没有一个能为我工作。
有人有任何提示我现在应该去哪里吗?哪个是最好的方向?
<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
<Cell><Data ss:Type="String">WR Status</Data></Cell>
<Cell><Data ss:Type="String">Request Plant</Data></Cell>
<Cell><Data ss:Type="String">Request #</Data></Cell>
<Cell><Data ss:Type="String">Item#</Data></Cell>
<Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
<Cell><Data ss:Type="String">WR Description</Data></Cell>
<Cell><Data ss:Type="String">W/O No</Data></Cell>
<Cell><Data ss:Type="String">Charge Plant</Data></Cell>
<Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
<Cell><Data ss:Type="String">Equip NO</Data></Cell>
<Cell><Data ss:Type="String">Equipment Name</Data></Cell>
<Cell><Data ss:Type="String">Required Date</Data></Cell>
<Cell><Data ss:Type="String">WO Type</Data></Cell>
<Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
<Cell><Data ss:Type="String">Exec. Plant</Data></Cell>
<Cell><Data ss:Type="String">Plant1</Data></Cell>
<Cell><Data ss:Type="String">Area</Data></Cell>
<Cell><Data ss:Type="String">Confirmed</Data></Cell>
<Cell><Data ss:Type="String">WO Status</Data></Cell>
<Cell><Data ss:Type="String">W/R Requester</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
解析的当前代码。大多数其他代码都在上面链接的上一个问题中。
private static void getAndParseFile() throws Exception {
System.out.println("getAndParseFile");
String fileName="C:\\Users\\windowsUserName\\Downloads\\F7BAH1P_List.xml";
File file = new File(fileName);
removeLineFromFile(file.getAbsolutePath());
System.out.println("Finished Removing Lines");
String fileContent = IOUtils.toString(new FileInputStream(file));
fileContent = fileContent.substring(0, fileContent.lastIndexOf('>')+1);
fileContent = fileContent.replaceAll("&#","");
PrintWriter pw = null;
pw = new PrintWriter(new FileWriter("C:\\Users\\windowsUserName\\Downloads\\tempfile.txt"));
pw.println(fileContent);
pw.flush();
ByteArrayInputStream bis = new ByteArrayInputStream(Charset.forName("UTF-16").encode(fileContent).array());
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandler handler = new SAXHandler();
parser.parse(bis, handler);
}
RemoveLineFromFile从xml文件的开头和末尾删除2 <row></row>
,这些空白或包含一些计数器/标题数据。
private static void removeLineFromFile(String file) {
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return;
}
br = new BufferedReader(new FileReader(file));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
pw = new PrintWriter(new FileWriter(tempFile));
line = null;
br.close();
br = null;
br = new BufferedReader(new FileReader(file));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
//Delete the original file
if (!inFile.delete()) {
System.out.println("Could not delete original file");
return;
}
//Rename the new file to the filename the original file had.
if (!tempFile.renameTo(inFile))
System.out.println("Could not rename temp file");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
这是通过“removelinefromfile”
之前的xml文件<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">List for Work Request(F7BAH1P)</Data></Cell>
</Row>
<Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
<Cell><Data ss:Type="String">WR Status</Data></Cell>
<Cell><Data ss:Type="String">Request Plant</Data></Cell>
<Cell><Data ss:Type="String">Request #</Data></Cell>
<Cell><Data ss:Type="String">Item#</Data></Cell>
<Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
<Cell><Data ss:Type="String">WR Description</Data></Cell>
<Cell><Data ss:Type="String">W/O No</Data></Cell>
<Cell><Data ss:Type="String">Charge Plant</Data></Cell>
<Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
<Cell><Data ss:Type="String">Equip NO</Data></Cell>
<Cell><Data ss:Type="String">Equipment Name</Data></Cell>
<Cell><Data ss:Type="String">Required Date</Data></Cell>
<Cell><Data ss:Type="String">WO Type</Data></Cell>
<Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
<Cell><Data ss:Type="String">Exec. Plant</Data></Cell>
<Cell><Data ss:Type="String">Plant1</Data></Cell>
<Cell><Data ss:Type="String">Area</Data></Cell>
<Cell><Data ss:Type="String">Confirmed</Data></Cell>
<Cell><Data ss:Type="String">WO Status</Data></Cell>
<Cell><Data ss:Type="String">W/R Requester</Data></Cell>
</Row>
<Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Count: 244</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
答案 0 :(得分:3)
您可能会遇到解析错误,因为您的文件编码与XML声明中的编码不匹配:
<?xml version="1.0" encoding="utf-16"?>
FileWriter和FileReader假设默认字符编码是可接受的(我系统上的UTF-8)。您不能依赖它们以便携方式处理UTF-16编码文件。这是他们的文档:
用于编写字符文件的便捷类。此类的构造函数假定默认字符编码和默认字节缓冲区大小是可接受的。要自己指定这些值,请在FileOutputStream上构造OutputStreamWriter。
读取字符文件的便捷类。此类的构造函数假定默认字符编码和默认字节缓冲区大小是适当的。要自己指定这些值,请在FileInputStream上构造一个InputStreamReader。
所以你需要做文档建议的 - 使用替代方案。
以下是一些快速测试代码,可通过您的removeLineFromFile
方法的三种不同实现来演示您的问题:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Encoding {
private static File removeLineFromFile2(String file) {
File ret = null;
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return ret;
}
ret = inFile;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".2.tmp");
pw = new PrintWriter(new OutputStreamWriter(
new FileOutputStream(tempFile), "UTF-16"));
line = null;
br.close();
br = null;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
System.out.println("Temp file is: " + tempFile.getAbsolutePath());
ret = tempFile;
}
} catch (Exception ex) {
ex.printStackTrace();
}
return ret;
}
private static File removeLineFromFile1(String file) {
File ret = null;
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return ret;
}
ret = inFile;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".1.tmp");
pw = new PrintWriter(new FileWriter(tempFile));
line = null;
br.close();
br = null;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
System.out.println("Temp file is: " + tempFile.getAbsolutePath());
ret = tempFile;
}
} catch (Exception ex) {
ex.printStackTrace();
}
return ret;
}
private static File removeLineFromFile(String file) {
File ret = null;
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return ret;
}
ret = inFile;
br = new BufferedReader(new FileReader(file));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
pw = new PrintWriter(new FileWriter(tempFile));
line = null;
br.close();
br = null;
br = new BufferedReader(new FileReader(file));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
System.out.println("Temp file is: " + tempFile.getAbsolutePath());
ret = tempFile;
}
} catch (Exception ex) {
ex.printStackTrace();
}
return ret;
}
private static void parse(File file) {
try {
System.out.println("Parsing " + file.getAbsolutePath());
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
DefaultHandler handler = new DefaultHandler();
parser.parse(file, handler);
} catch (Exception ex) {
System.out.println("An exception occurred: " + ex.getMessage());
} finally {
System.out.println("Done with " + file.getAbsolutePath());
}
}
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
System.out.println("getAndParseFile");
String fileName=args[0];
File file = new File(fileName);
File f2 = removeLineFromFile2(file.getAbsolutePath());
File f1 = removeLineFromFile1(file.getAbsolutePath());
File f = removeLineFromFile(file.getAbsolutePath());
System.out.println("Finished Removing Lines");
parse(f2);
parse(f1);
parse(f);
}
}
removeLineFromFile2
表示您需要做的事情,removeLineFromFile1
表示如果您正确阅读内容会发生什么,但是以错误的方式写出来(我怀疑您的情况正在发生) removeLineFromFile
是您的实现,在我的系统上什么都不做。
getAndParseFile
Temp file is: \path\to\sample-utf16.xml.2.tmp
Temp file is: \path\to\sample-utf16.xml.1.tmp
Finished Removing Lines
Parsing \path\to\sample-utf16.xml.2.tmp
Done with \path\to\sample-utf16.xml.2.tmp
Parsing \path\to\sample-utf16.xml.1.tmp
An exception occurred: Content is not allowed in prolog.
Done with \path\to\sample-utf16.xml.1.tmp
Parsing \path\to\sample-utf16.xml
Done with \path\to\sample-utf16.xml
以上所有假设您的输入文件确实是XML文件中指定的UTF-16。我认为情况并非如此。如果你自己创建了文件,那么你是以错误的方式完成的。尝试在Notepad ++(或类似工具)中打开它,并通过编码菜单检查编码(应该说是UCS-2或UTF-16,而不是ANSI,UTF-8等)。
您的代码应始终明确指定所需文件的编码。