好奇,如果有人注意到这一点,但我有一个WYSIWYG,用户偶尔会从单词粘贴到。有一个词清洁剂,但不是每个人都是天才。
如果我在其他地方解析该文本,它就会出现。但如果我将其截断,则会出现msword代码。
有没有人知道为什么截断会使这个||无法使用有没有人知道如何同时消毒和截断?
更新:
以下是我截断后显示的msword示例:
≪! [If Gte Mso 9]>≪Xml> ≪Br /> ≪O:Office Document Settings> ≪Br /> ≪O:Allow Png/> ≪Br /> ≪/O:Office Document Settings> ≪Br />≪/Xml>≪![Endif] >≪! [If Gte Mso 9]>≪Xml> ≪Br /> ≪W:Word Document> ≪Br /> ≪W:Zoom>0≪/W:Zoom> ≪Br /> ≪W:Track Moves>False≪/W:Track Moves> ≪Br /> ≪W:Track Formatting/> ≪Br /> ≪W:Punctuation Kerning/> ≪Br /> ≪W:Drawing Grid Horizontal Spacing>18 Pt≪/W:Drawing Grid Horizontal Spacing> ≪Br /> ≪W:Drawing Grid Vertical Spacing>18 Pt≪/W:Drawing Grid Vertical Spacing> ≪Br /> ≪W:Display Horizontal Drawing Grid Every>0≪/W:Display Horizontal Drawing Grid Every> ≪Br /> ≪W:Display Vertical Drawing Grid Every>0≪/W:Display Vertical Drawing Grid Every> ≪Br /> ≪W:Validate Against Schemas/> ≪Br /> ≪W:Save If Xml Invalid>False≪/W:Save If Xml Invalid> ≪Br /> ≪W:Ignore Mixed Content>False≪/W:Ignore Mixed Content> ≪Br /> ≪W:Always Show Placeholder Text>False≪/W:Always Show Placeholder Text> ≪Br /> ≪W:Compatibility> ≪Br /> ≪W:Break Wrapped Tables/> ≪Br /> ≪W:Dont Grow Autofit/> ≪Br /> ≪W:Dont Autofit Constrained Tables/> ≪Br /> ≪W:Dont Vert Align In Txbx/> ≪Br /> ≪/W:Compatibility> ≪Br /> ≪/W:Word Document> ≪Br />≪/Xml>≪![Endif] >≪! [If Gte Mso 9]>≪Xml> ≪Br /> ≪W:Latent Styles Def Locked State="False" Latent Style Count="276"> ≪Br /> ≪/W:Latent Styles> ≪Br />≪/Xml>≪![Endif] > ≪! {Cke Protected}%3 C!%2 D%2 D%7 Bcke Protected%7 D%253 C!%252 D%252 D%257 Bcke Protected%257 D%25253 C!%25252 D%25252 D%25257 Bcke Protected%25257 D%2525253 C!%2525252 D%2525252 D%2525257 Bcke Protected%2525257 D%252525253 C!%252525252 D%252525252 D%252525257 Bcke Protected%252525257 D%25252525253 C!%25252525252 D%25252525252 D%25252525257 Bcke Protected%25252525257 D%2525252525253 C!%2525252525252 D%2525252525252 D%2525252525250 A%25252525252520%2525252525252 F*%25252525252520 Font%25252525252520 Definitions%25252525252520*%2525252525252 F%2525252525250 A%25252525252540font Face%2525252525250 A%25252525252509%2525252525257 Bfont Family%2525252525253 A Times%2525252525253 B%2525252525250 A%25252525252509panose 1%2525252525253 A2%252525252525200%252525252525205%252525252525200%252525252525200%252525252525200%252525252525200%252525252525200%252525252525200%252525252525200%2525252525253 B%2525252525250 A%25252525252509mso Font Charset%2525252525253 A0%2525252525253 B%2525252525250 A%25252525252509mso Generic Font Family%2525252525253 Aauto%2525252525253 B%2525252525250 A%25252525252509mso Font Pitch%2525252525253 Avariable%2525252525253 B%2525252525250 A%25252525252509mso Font Signature%2525252525253 A3%252525252525200%252525252525200%252525252525200%252525252525201%252525252525200%2525252525253 B%2525252525257 D%2525252525250 A%25252525252540font Face%2525252525250 A%25252525252509%2525252525257 Bfont Family%2525252525253 A Verdana%2525252525253 B%2525252525250 A%25252525252509panose 1%2525252525253 A2%2525252525252011%252525252525206%252525252525204%25
整件事大约有600个字符。这是前200个左右:
“Excellent” – The New York Times
“4 Stars” - The Star-Ledger
“Best Romantic Restaurant” – Suburban Essex
“Best View” – OpenTable
In December 1986, the Knowles opened Highlawn after months of restoration to the former open-air “casino” which had, along with the now-prosperous park, been neglected for several years.
这是我在Stackoverflow的帮助下制作的自定义清洁剂:
def sanitized_text(text)
sanitized = text.gsub(/≪[^>]*>/, '')
end
这个清理程序的问题是它在我截断为125个字符后返回空白空格。我将它扩展为600个字符,我得到一行是另一个msword条件语句。
更新的 这是产生msword内容的代码。
= truncate(organization.about_us, 125)
请注意,我刚才这样说:
= organization.about_us
它很好,但当然不会被截断。
我还应该添加这是Ruby 1.8.7 / rails 2.3.5
答案 0 :(得分:1)
截断HTML总是很麻烦,因为你最终可能会拆分标签和实体。如果没有正确的UTF-8处理,你也有可能将两个字节的字符切成两半。
另外需要注意的是过于贪婪的正则表达式:
def sanitized_text(text)
sanitized = text.gsub(/≪[^>]*?>/, '')
end
*?将捕获匹配的最小值,其中*将捕获最大匹配。
例如:
<A><B>
这可以分为“&lt;”,“A&gt;&lt; B”和“&gt;”如果你最终得错了表达式。
编辑:我试图重现这一点并且没有运气。
通过这个例子,使用粘贴和消毒的文本,一切似乎都没问题。
# app/controllers/example_controller.rb
class ExampleController < ApplicationController
def index
@text = '≪! [If Gte Mso 9]>≪Xml> ≪Br /> ≪O:Office Document Settings> ≪Br /> ≪O:Allow Png/> ≪Br /> ≪/O:Office Document Settings> ≪Br />≪/Xml>≪![Endif] >≪! [If Gte Mso 9]>≪Xml> ≪Br /> ≪W:Word Document> ≪Br /> ≪W:Zoom>0≪/W:Zoom> ≪Br /> ≪W:Track Moves>False≪/W:Track Moves> ≪Br /> ≪W:Track Formatting/> ≪Br /> ≪W:Punctuation Kerning/> ≪Br /> ≪W:Drawing Grid Horizontal Spacing>18 Pt≪/W:Drawing Grid Horizontal Spacing> ≪Br /> ≪W:Drawing Grid Vertical Spacing>18 Pt≪/W:Drawing Grid Vertical Spacing> ≪Br /> ≪W:Display Horizontal Drawing Grid Every>0≪/W:Display Horizontal Drawing Grid Every> ≪Br /> ≪W:Display Vertical Drawing Grid Every>0≪/W:Display Vertical Drawing Grid Every> ≪Br /> ≪W:Validate Against Schemas/> ≪Br /> ≪W:Save If Xml Invalid>False≪/W:Save If Xml Invalid> ≪Br /> ≪W:Ignore Mixed Content>False≪/W:Ignore Mixed Content> ≪Br /> ≪W:Always Show Placeholder Text>False≪/W:Always Show Placeholder Text> ≪Br /> ≪W:Compatibility> ≪Br /> ≪W:Break Wrapped Tables/> ≪Br /> ≪W:Dont Grow Autofit/> ≪Br /> ≪W:Dont Autofit Constrained Tables/> ≪Br /> ≪W:Dont Vert Align In Txbx/> ≪Br /> ≪/W:Compatibility> ≪Br /> ≪/W:Word Document> ≪Br />≪/Xml>≪![Endif] >≪! [If Gte Mso 9]>≪Xml> ≪Br /> ≪W:Latent Styles Def Locked State="False" Latent Style Count="276"> ≪Br /> ≪/W:Latent Styles> ≪Br />≪/Xml>≪![Endif] > ≪! {Cke Protected}%3 C!%2 D%2 D%7 Bcke Protected%7 D%253 C!%252 D%252 D%257 Bcke Protected%257 D%25253 C!%25252 D%25252 D%25257 Bcke Protected%25257 D%2525253 C!%2525252 D%2525252 D%2525257 Bcke Protected%2525257 D%252525253 C!%252525252 D%252525252 D%252525257 Bcke Protected%252525257 D%25252525253 C!%25252525252 D%25252525252 D%25252525257 Bcke Protected%25252525257 D%2525252525253 C!%2525252525252 D%2525252525252 D%2525252525250 A%25252525252520%2525252525252 F*%25252525252520 Font%25252525252520 Definitions%25252525252520*%2525252525252 F%2525252525250 A%25252525252540font Face%2525252525250 A%25252525252509%2525252525257 Bfont Family%2525252525253 A Times%2525252525253 B%2525252525250 A%25252525252509panose 1%2525252525253 A2%252525252525200%252525252525205%252525252525200%252525252525200%252525252525200%252525252525200%252525252525200%252525252525200%252525252525200%2525252525253 B%2525252525250 A%25252525252509mso Font Charset%2525252525253 A0%2525252525253 B%2525252525250 A%25252525252509mso Generic Font Family%2525252525253 Aauto%2525252525253 B%2525252525250 A%25252525252509mso Font Pitch%2525252525253 Avariable%2525252525253 B%2525252525250 A%25252525252509mso Font Signature%2525252525253 A3%252525252525200%252525252525200%252525252525200%252525252525201%252525252525200%2525252525253 B%2525252525257 D%2525252525250 A%25252525252540font Face%2525252525250 A%25252525252509%2525252525257 Bfont Family%2525252525253 A Verdana%2525252525253 B%2525252525250 A%25252525252509panose 1%2525252525253 A2%2525252525252011%252525252525206%252525252525204%2'
end
end
# app/helpers/example_helper.rb
module ExampleHelper
def sanitized_text(text)
text.gsub(/≪[^>]*>/, '')
end
end
视图本身就是你所拥有的:
<!-- app/views/example/index.html.erb -->
<body>
<strong>Original</strong>
<div>
<%= sanitized_text(@text) %>
</div>
<strong>Truncated</strong>
<div>
<%= truncate(sanitized_text(@text), :length => 125) %>
</div>
<strong>Truncated With Deprecated Option</strong>
<div>
<%= truncate(sanitized_text(@text), 125) %>
</div>
</body>
这是在OS X上使用Ruby 1.8.7p174,Rails 2.3.5使用WEBrick进行测试。