如何阅读office文档(docx,xl​​sx,pptx等)并在cfm ColdFusion中显示?

时间:2014-12-22 02:23:05

标签: coldfusion apache-poi document coldfusion-10 file-conversion

我已经阅读并尝试了raymond camden的教程来阅读office文档并在cfm页面中显示。由于教程是在几年前完成的,因此只能读取doc,ppt和xls,而不是docx,pptx,xlsx等新的教程。我怎么能成功读取文件?重命名文件扩展名不适用于Apache POI。

我的代码:

<!--- where the poi files are --->
<cfset jarpath = expandPath("./jars")>
<cfset paths = []>
<cfdirectory action="list" name="files" directory="#jarpath#" filter="*.jar" recurse="true">

<cfloop query="files">
<cfset arrayAppend(paths, directory & "/" & name)>
</cfloop>

<!--- load javaloader --->
<cfset variables.loader = createObject("component", "javaloader.JavaLoader").init(paths)>

<!--- generic file reader doohicky --->
<cfset myfile = createObject("java","java.io.FileInputStream")>

<!--- get our required things loaded --->

<!--- Word --->
<cfset doc = loader.create("org.apache.poi.hwpf.HWPFDocument")>
<cfset wordext = loader.create("org.apache.poi.hwpf.extractor.WordExtractor")>

<!--- Excel --->
<cfset excel = loader.create("org.apache.poi.hssf.usermodel.HSSFWorkbook")>
<cfset xlsext = loader.create("org.apache.poi.hssf.extractor.ExcelExtractor")>

<!--- Powerpoint --->
<cfset ppt = loader.create("org.apache.poi.hslf.HSLFSlideShow")>
<cfset pptext = loader.create("org.apache.poi.hslf.extractor.PowerPointExtractor")>

<!--- get files --->
<cfset filePath = expandPath("./testdocs")>
<cfdirectory action="list" name="files" directory="#filePath#">

<cfoutput query="files">
<cfset theFile = filePath & "/" & name>
<cfset myfile.init(theFile)>

Reading: #theFile#<br/>

<cfswitch expression="#listLast(name,".")#">

<cfcase value="doc,docx">
<cfset finalfile = Replace(theFile, listLast(name,"."), "doc")>
<cfset doc = doc.init(finalfile)>
<cfset wordext.init(doc)>
<cfoutput>
<pre>
#wordext.getText()#
</pre>
</cfoutput>
</cfcase>

<cfcase value="xls,xlsx">
<cfset finalfile = Replace(theFile, listLast(name,"."), "xls")>
<cfset excel = excel.init(finalfile)>
<cfset xlsext = xlsext.init(excel)>
<cfoutput>
<pre>
#xlsext.getText()#
</pre>
</cfoutput>
</cfcase>

<cfcase value="ppt,pptx">
<cfset finalfile = Replace(theFile, listLast(name,"."), "ppt")>
<cfset ppt = ppt.init(finalfile)>
<cfset pptext = pptext.init(ppt)>
<cfoutput>
<pre>
#pptext.getText(true,true)#
</pre>
</cfoutput>
</cfcase>
</cfswitch>

<p><hr/></p>

</cfoutput>

这些是简短错误的描述:

Object instantiation exception.

An exception occurred while instantiating a Java object. The class must not be an interface or an abstract class. Error: ''.

The error occurred in C:/ColdFusion11/cfusion/wwwroot/TicketOnThePlane/test.cfm: line 66
64 : <cfcase value="ppt,pptx">
65 : <cfset finalfile = Replace(theFile, listLast(name,"."), "ppt")>
66 : <cfset ppt = ppt.init(finalfile)>
67 : <cfset pptext = pptext.init(ppt)>
68 : <cfoutput>

1 个答案:

答案 0 :(得分:2)

您使用的类仅适用于二进制Office文件,即Office 97-2003。更改文件扩展名无法正常工作,因为这实际上并不会修改文件的格式。它们内部仍然是ooxml文件。

如果要从不同的文件类型(以及两种格式,即binary和ooxml)中提取文本,请使用ExtractorFactory。它将自动确定给定文件的正确提取器(如果支持)。还有另一个条目在您发布的链接末尾显示how to use the ExtractorFactory

NB:这需要OOXML POI jars

....
loader = createObject("component", "javaloader.JavaLoader").init(paths);
extractorFactory = loader.create("org.apache.poi.extractor.ExtractorFactory");
pathToFile = "c:/path/to/someFile.xlsx";
myfile = createObject("java","java.io.File").init(pathToFile);
extractor = extractorFactory.createExtractor(myFile);
WriteDump( extractor.getText());

修改:

  

这些是简短错误的描述:

旁注,当使用java对象时,错误消息文本通常只会显示样板消息。你需要查看堆栈跟踪以获得真正的&#34;原因&#34;错误。