我已经阅读并尝试了raymond camden的教程来阅读office文档并在cfm页面中显示。由于教程是在几年前完成的,因此只能读取doc,ppt和xls,而不是docx,pptx,xlsx等新的教程。我怎么能成功读取文件?重命名文件扩展名不适用于Apache POI。
我的代码:
<!--- where the poi files are --->
<cfset jarpath = expandPath("./jars")>
<cfset paths = []>
<cfdirectory action="list" name="files" directory="#jarpath#" filter="*.jar" recurse="true">
<cfloop query="files">
<cfset arrayAppend(paths, directory & "/" & name)>
</cfloop>
<!--- load javaloader --->
<cfset variables.loader = createObject("component", "javaloader.JavaLoader").init(paths)>
<!--- generic file reader doohicky --->
<cfset myfile = createObject("java","java.io.FileInputStream")>
<!--- get our required things loaded --->
<!--- Word --->
<cfset doc = loader.create("org.apache.poi.hwpf.HWPFDocument")>
<cfset wordext = loader.create("org.apache.poi.hwpf.extractor.WordExtractor")>
<!--- Excel --->
<cfset excel = loader.create("org.apache.poi.hssf.usermodel.HSSFWorkbook")>
<cfset xlsext = loader.create("org.apache.poi.hssf.extractor.ExcelExtractor")>
<!--- Powerpoint --->
<cfset ppt = loader.create("org.apache.poi.hslf.HSLFSlideShow")>
<cfset pptext = loader.create("org.apache.poi.hslf.extractor.PowerPointExtractor")>
<!--- get files --->
<cfset filePath = expandPath("./testdocs")>
<cfdirectory action="list" name="files" directory="#filePath#">
<cfoutput query="files">
<cfset theFile = filePath & "/" & name>
<cfset myfile.init(theFile)>
Reading: #theFile#<br/>
<cfswitch expression="#listLast(name,".")#">
<cfcase value="doc,docx">
<cfset finalfile = Replace(theFile, listLast(name,"."), "doc")>
<cfset doc = doc.init(finalfile)>
<cfset wordext.init(doc)>
<cfoutput>
<pre>
#wordext.getText()#
</pre>
</cfoutput>
</cfcase>
<cfcase value="xls,xlsx">
<cfset finalfile = Replace(theFile, listLast(name,"."), "xls")>
<cfset excel = excel.init(finalfile)>
<cfset xlsext = xlsext.init(excel)>
<cfoutput>
<pre>
#xlsext.getText()#
</pre>
</cfoutput>
</cfcase>
<cfcase value="ppt,pptx">
<cfset finalfile = Replace(theFile, listLast(name,"."), "ppt")>
<cfset ppt = ppt.init(finalfile)>
<cfset pptext = pptext.init(ppt)>
<cfoutput>
<pre>
#pptext.getText(true,true)#
</pre>
</cfoutput>
</cfcase>
</cfswitch>
<p><hr/></p>
</cfoutput>
这些是简短错误的描述:
Object instantiation exception.
An exception occurred while instantiating a Java object. The class must not be an interface or an abstract class. Error: ''.
The error occurred in C:/ColdFusion11/cfusion/wwwroot/TicketOnThePlane/test.cfm: line 66
64 : <cfcase value="ppt,pptx">
65 : <cfset finalfile = Replace(theFile, listLast(name,"."), "ppt")>
66 : <cfset ppt = ppt.init(finalfile)>
67 : <cfset pptext = pptext.init(ppt)>
68 : <cfoutput>
答案 0 :(得分:2)
您使用的类仅适用于二进制Office文件,即Office 97-2003。更改文件扩展名无法正常工作,因为这实际上并不会修改文件的格式。它们内部仍然是ooxml文件。
如果要从不同的文件类型(以及两种格式,即binary和ooxml)中提取文本,请使用ExtractorFactory
。它将自动确定给定文件的正确提取器(如果支持)。还有另一个条目在您发布的链接末尾显示how to use the ExtractorFactory。
NB:这需要OOXML POI jars
....
loader = createObject("component", "javaloader.JavaLoader").init(paths);
extractorFactory = loader.create("org.apache.poi.extractor.ExtractorFactory");
pathToFile = "c:/path/to/someFile.xlsx";
myfile = createObject("java","java.io.File").init(pathToFile);
extractor = extractorFactory.createExtractor(myFile);
WriteDump( extractor.getText());
修改:
这些是简短错误的描述:
旁注,当使用java对象时,错误消息文本通常只会显示样板消息。你需要查看堆栈跟踪以获得真正的&#34;原因&#34;错误。