客户端将.doc格式的文件上传到服务器目录,并根据Ray Camdens发布here使用POI提取其中的文本内容保存在MySQL数据库的文本/备注字段中,并且作为wsdl使用的Web服务提供。所有工作都按预期工作,直到Web服务的使用者访问包含某些(我假定的)控制字符的记录,此时Web服务会抛出500错误。
在数据库中,问题行似乎有控制字符,当在Firefox中显示文本字段时,也有奇数字符。 Web服务只返回returntype =“any”的CF查询,并被称为
<cfinvoke webservice="https://nww.someplace.nhs.uk/cfcs/providerapi.cfc?wsdl"
method="getPendingReferrals" returnvariable="getReferrals">
<cfinvokeargument name="userName" value=#username#>
<cfinvokeargument name="password" value=#password#>
<cfinvokeargument name="maxrows" value=#maxrows#>
</cfinvoke>
我认为WSDL不能传输这些字符,所以有没有办法对它们进行编码,或者我是否只需要使用正则表达式或其他东西将它们删除?
<cfcomponent>
<cffunction output="false" access="remote" returntype="any" name="getPendingReferrals">
<cfargument required="false" name="userName" type="string"/>
<cfargument required="false" name="password" type="string"/>
<cfargument required="false" name="maxrows" type="numeric" default="20"/>
<cfset var q="">
<cfinvoke component="cfcs.security" method="checkAuthenticated" returnvariable="checkAuth">
<cfinvokeargument name="username" value="#arguments.userName#">
<cfinvokeargument name="password" value="#arguments.password#">
</cfinvoke>
<cfif checkAuth.authenticates is "true">
<!--- log the login --->
<cfset filename=#datepart("yyyy", now())#&#datepart("m", now())#&#datepart("d", now())#&"loginlog.txt">
<CFSET OUTFILE = "#application.Root#"&"logs\"&"#filename#">
<cfif #FileExists(OUTFILE)# is "Yes">
<cffile action="append" file="#OUTFILE#" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#">
<cfelse>
<CFFILE action="write" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#" file="#OUTFILE#">
</cfif>
<cfif checkAuth.organisationID is 1>
<cfset toStr="toID=1">
<cfelseif checkAuth.organisationID is 28>
<cfset toStr="(toID=28 OR toID=29)">
</cfif>
<cfquery name="q" datasource='mySqlData' maxrows=#arguments.maxrows#>
SELECT messages.messageID, messages.toID, messages.fromID AS referrerID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.fromID) as referrerName,messages.threadID, messages.messageBody, messages.dateCreated, messages.dateSent,
messages.deleted, messages.createdByID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.createdByID) as createdByName, (SELECT organisationName FROM organisations WHERE messages.originatingOrganisationID = organisations.organisationID) as originatingOrganisationName, messages.originatingOrganisationID, messages.viewed, messages.referral, messages.actioned, messages.patientID, messages.refTypeID, messages.specialtyID, organisations.organisationName AS toOrganisationName, patients.nhsNumber AS patientNHSnumber, patients.patientTitle, patients.patientLastname, patients.patientFirstname, patients.patientDOB, patients.address1 as patientAddress1, patients.address2 AS patientAddress2, patients.address3 AS patientAddress3, patients.address4 AS patientAddress4, patients.postcode AS patientPostcode, patients.patientPhone1
FROM users INNER JOIN (organisations INNER JOIN (patients INNER JOIN messages ON patients.patientID = messages.patientID) ON organisations.organisationID = messages.toID) ON users.userID = messages.fromID
WHERE #toStr#
AND NOT actioned
AND NOT originatingOrganisationID=3
ORDER BY messageID
</cfquery>
<cfif isQuery(q)>
<cfreturn q>
<cfelse>
<cfreturn "Error : in query">
</cfif>
<cfelse>
<cfreturn "Error : failed to authenticate">
</cfif>
</cffunction>
答案 0 :(得分:2)
您应该使用正则表达式删除所有高位ascii字符。我发现的最好的一个是written up by Ben Nadel, here。 (虽然它并不完美,I made some improvements to it in the comments。)
基本上,如果您只想删除高ascii字符,请执行以下操作:
<cfset result = reReplace(messageBody, "[^\x20-\x7E\x0D\x09]", "", "all") />
此正则表达式采用白名单方法,仅允许保留可打印字符:
\x20-\x7E
= {space}! “#$%&amp;'()* +, - 。/ 0-9:;&lt; =&gt;?@ A-Z [\] ^ _` a-z {|}〜\x0D
=回车\x09
=水平标签如果您喜欢这种清理方法,可以使用Sean Coyne的方法通过循环更新查询:
<cfloop query="q">
<cfset querySetCell(
q,
"messageBody",
clean(q.messageBody[q.currentRow]),
q.currentRow
)/>
</cfloop>
<cffunction name="clean">
<cfargument name="in" />
<cfreturn reReplace(arguments.in, "[^\x20-\x7E\x0D\x09]", "", "all") />
</cffunction>
答案 1 :(得分:1)
这不太理想,但您可以尝试以下方式:
<cfloop query="q">
<cfset querySetCell(q,"messageBody",xmlFormat(q.messageBody[q.currentRow]),q.currentRow) />
</cfloop>
如果xmlFormat无法删除所有字符(已知它会遗漏几个字符),您可能需要编写一个手动方法来删除它们。
答案 2 :(得分:0)
正如Sean所说,你需要逃避所有类型的特殊字符才能获得有效的XML - 例如http://www.petefreitag.com/item/202.cfm看看