我正在尝试使用apache poi读取java中超过100000行的excel文件。但我遇到的问题很少。
1-)从excel文件中提取数据需要10到15分钟。
2-)运行代码时,笔记本电脑开始挂起。因此,获取数据变得很困难,然后我必须重新启动笔记本电脑。
还有其他方法可以使用Java在更短的时间内从excel文件中获取数据吗?
这是我当前的代码:
public class ReadRfdsDump {
public void readRfdsDump() {
try {
FileInputStream file = new FileInputStream(new File("C:\\Users\\esatnir\\Videos\\sprint\\sprintvision.sprint.com_Trackor Browser_RF Design Sheet_07062018122740.xlsx"));
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
DataFormatter df = new DataFormatter();
for(int i=0;i<2;i++) {
Row row= sheet.getRow(i);
System.out.println(df.formatCellValue(row.getCell(1)));
}
}catch(Exception e) {
e.printStackTrace();
}
}
}
答案 0 :(得分:2)
Apache poi
默认使用WorkbookFactory.create
或new XSSFWorkbook
打开工作簿将始终解析整个工作簿,包括所有工作表。如果工作簿包含大量数据,则会导致较高的内存使用率。使用File
而不是InputStream
打开工作簿可以减少内存使用。但这会导致其他问题,因为使用过的文件将无法覆盖,至少不能覆盖*.xlsx
个文件。
有XSSF and SAX (Event API)可以获取底层XML数据,并使用SAX
进行处理。
但是,如果我们已经处于获取底层XML数据并进行处理的这一级别,那么我们也可以再退一步。
*.xlsx
文件是ZIP
档案,包含目录结构中XML
个文件中的数据。因此,我们可以解压缩*.xlsx
文件,然后从XML
文件中获取数据。
其中有/xl/sharedStrings.xml
,其中包含所有字符串单元格值。并且/xl/workbook.xml
描述了工作簿的结构。还有/xl/worksheets/sheet1.xml
,/xl/worksheets/sheet2.xml
,...正在存储工作表数据。并且/xl/styles.xml
具有工作表中所有单元格的样式设置。
因此,我们所需要的只是使用ZIP
处理Java
文件系统。使用java.nio.file.FileSystems支持此功能。
我们需要解析XML
的可能性。 Package javax.xml.stream是我的最爱。
以下显示了工作草案。它解析/xl/sharedStrings.xml
。它还解析/xl/styles.xml
。但是它仅获取数字格式和单元格数字格式设置。数字格式设置对于检测日期/时间值至关重要。然后,它解析包含第一张纸的数据的/xl/worksheets/sheet1.xml
。为了检测数字格式是否为日期格式,因此格式化的单元格包含日期/时间值,使用了一个单独的apache poi
类org.apache.poi.ss.usermodel.DateUtil
。这样做是为了简化代码。当然,即使是这堂课,我们也可以自己编写。
import java.nio.file.Paths;
import java.nio.file.Path;
import java.nio.file.Files;
import java.nio.file.FileSystems;
import java.nio.file.FileSystem;
import javax.xml.stream.*;
import javax.xml.stream.events.*;
import javax.xml.namespace.QName;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.util.HashMap;
import java.util.Date;
import org.apache.poi.ss.usermodel.DateUtil;
public class UnZipAndReadXLSXFileSystem {
public static void main (String args[]) throws Exception {
XMLEventReader reader = null;
XMLEvent event = null;
Attribute attribute = null;
StartElement startElement = null;
EndElement endElement = null;
String characters = null;
StringBuilder stringValue = new StringBuilder(); //for collecting the characters to complete values
List<String> sharedStrings = new ArrayList<String>(); //list of shared strings
Map<String, String> numberFormats = new HashMap<String, String>(); //map of number formats
List<String> cellNumberFormats = new ArrayList<String>(); //list of cell number formats
Path source = Paths.get("ExcelExample.xlsx"); //path to the Excel file
FileSystem fs = FileSystems.newFileSystem(source, null); //get filesystem of Excel file
//get shared strings ==============================================================================
Path sharedStringsTable = fs.getPath("/xl/sharedStrings.xml");
reader = XMLInputFactory.newInstance().createXMLEventReader(Files.newInputStream(sharedStringsTable));
boolean siFound = false;
while (reader.hasNext()) {
event = (XMLEvent)reader.next();
if (event.isStartElement()){
startElement = (StartElement)event;
if (startElement.getName().getLocalPart().equalsIgnoreCase("si")) {
//start element of shared string item
siFound = true;
stringValue = new StringBuilder();
}
} else if (event.isCharacters() && siFound) {
//chars of the shared string item
characters = event.asCharacters().getData();
stringValue.append(characters);
} else if (event.isEndElement() ) {
endElement = (EndElement)event;
if (endElement.getName().getLocalPart().equalsIgnoreCase("si")) {
//end element of shared string item
siFound = false;
sharedStrings.add(stringValue.toString());
}
}
}
reader.close();
System.out.println(sharedStrings);
//shared strings ==================================================================================
//get styles, number formats are essential for detecting date / time values =======================
Path styles = fs.getPath("/xl/styles.xml");
reader = XMLInputFactory.newInstance().createXMLEventReader(Files.newInputStream(styles));
boolean cellXfsFound = false;
while (reader.hasNext()) {
event = (XMLEvent)reader.next();
if (event.isStartElement()){
startElement = (StartElement)event;
if (startElement.getName().getLocalPart().equalsIgnoreCase("numFmt")) {
//start element of number format
attribute = startElement.getAttributeByName(new QName("numFmtId"));
String numFmtId = attribute.getValue();
attribute = startElement.getAttributeByName(new QName("formatCode"));
numberFormats.put(numFmtId, ((attribute != null)?attribute.getValue():"null"));
} else if (startElement.getName().getLocalPart().equalsIgnoreCase("cellXfs")) {
//start element of cell format setting
cellXfsFound = true;
} else if (startElement.getName().getLocalPart().equalsIgnoreCase("xf") && cellXfsFound ) {
//start element of format setting in cell format setting
attribute = startElement.getAttributeByName(new QName("numFmtId"));
cellNumberFormats.add(((attribute != null)?attribute.getValue():"null"));
}
} else if (event.isEndElement() ) {
endElement = (EndElement)event;
if (endElement.getName().getLocalPart().equalsIgnoreCase("cellXfs")) {
//end element of cell format setting
cellXfsFound = false;
}
}
}
reader.close();
System.out.println(numberFormats);
System.out.println(cellNumberFormats);
//styles ==========================================================================================
//get sheet data of first sheet ===================================================================
Path sheet1 = fs.getPath("/xl/worksheets/sheet1.xml");
reader = XMLInputFactory.newInstance().createXMLEventReader(Files.newInputStream(sheet1));
boolean rowFound = false;
boolean cellFound = false;
boolean cellValueFound = false;
boolean inlineStringFound = false;
String cellStyle = null;
String cellType = null;
while (reader.hasNext()) {
event = (XMLEvent)reader.next();
if (event.isStartElement()){
startElement = (StartElement)event;
if (startElement.getName().getLocalPart().equalsIgnoreCase("row")) {
//start element of row
rowFound = true;
System.out.print("<Row");
attribute = startElement.getAttributeByName(new QName("r"));
System.out.print(" r=" + ((attribute != null)?attribute.getValue():"null"));
System.out.println(">");
} else if (startElement.getName().getLocalPart().equalsIgnoreCase("c") && rowFound) {
//start element of cell in row
cellFound = true;
System.out.print("<Cell");
attribute = startElement.getAttributeByName(new QName("r"));
System.out.print(" r=" + ((attribute != null)?attribute.getValue():"null"));
attribute = startElement.getAttributeByName(new QName("t"));
System.out.print(" t=" + ((attribute != null)?attribute.getValue():"null"));
cellType = ((attribute != null)?attribute.getValue():null);
attribute = startElement.getAttributeByName(new QName("s"));
System.out.print(" s=" + ((attribute != null)?attribute.getValue():"null"));
cellStyle = ((attribute != null)?attribute.getValue():null);
System.out.print(">");
} else if (startElement.getName().getLocalPart().equalsIgnoreCase("v") && cellFound) {
//start element of value in cell
cellValueFound = true;
System.out.print("<V>");
stringValue = new StringBuilder();
} else if (startElement.getName().getLocalPart().equalsIgnoreCase("is") && cellFound) {
//start element of inline string in cell
inlineStringFound = true;
System.out.print("<Is>");
stringValue = new StringBuilder();
}
} else if (event.isCharacters() && cellFound && (cellValueFound || inlineStringFound)) {
//chars of the cell value or the inline string
characters = event.asCharacters().getData();
stringValue.append(characters);
} else if (event.isEndElement()) {
endElement = (EndElement)event;
if (endElement.getName().getLocalPart().equalsIgnoreCase("row")) {
//end element of row
rowFound = false;
System.out.println("</Row>");
} else if (endElement.getName().getLocalPart().equalsIgnoreCase("c")) {
//end element of cell
cellFound = false;
System.out.println("</Cell>");
} else if (endElement.getName().getLocalPart().equalsIgnoreCase("v")) {
//end element of value
cellValueFound = false;
String cellValue = stringValue.toString();
if ("s".equals(cellType)) {
cellValue = sharedStrings.get(Integer.valueOf(cellValue));
}
if (cellStyle != null) {
int s = Integer.valueOf(cellStyle);
String formatIndex = cellNumberFormats.get(s);
String formatString = numberFormats.get(formatIndex);
if (DateUtil.isADateFormat(Integer.valueOf(formatIndex), formatString)) {
double dDate = Double.parseDouble(cellValue);
Date date = DateUtil.getJavaDate(dDate);
cellValue = date.toString();
}
}
System.out.print(cellValue);
System.out.print("</V>");
} else if (endElement.getName().getLocalPart().equalsIgnoreCase("is")) {
//end element of inline string
inlineStringFound = false;
String cellValue = stringValue.toString();
System.out.print(cellValue);
System.out.print("</Is>");
}
}
}
reader.close();
//sheet data ======================================================================================
fs.close();
}
}
答案 1 :(得分:0)
Apache POI 是您的朋友-正确。但是当我用公式阅读非常大的Excel时,我遇到了OutOfMemory。
我的解决方案。如果您只想从 XLSX 文件中读取数据,而不必担心公式,则可以将其读取为简单xml 文件并从中提取数据(我们很容易)。
xl\worksheets
文件夹中,每一页找到一个xml文件