我一直在努力尝试用一个简单的正则表达式做到这一点,但它从未如此准确。它不一定非常完美。
Source包含
和
标签的组合。我不想算空行。
旧方式:
self.words = rendered.gsub(/<p> <\/p>/,'').gsub(/<p><br\s?\/?>|(?:<br\s?\/?>){2,}/,'<br>').scan(/<br>|<br \/>|<p/).size+1
新方式(不工作: 尝试将所有的+ +转换为段落,然后将它扔进nokogiri来计算其中包含超过3个字符的段落标记(我不知道如何计算?计算1个字母的行也会很好,但这在的JavaScript)
h = rendered
h.gsub!(/<br>\s*<br>/gi,"<p>")
h.gsub!(/<br>/gi,"<p>") if h =~ /<br>\s*<br>/
h.prepend "<p>" if !h =~ /^\s*<p[^>]*>/i
h.replace(/<p>\s*<p>/g,"<p> </p><p>")
Nokogiri::HTML(rendered)
# find+count p tags with at least 1-3 chars?
# this is javascript not ruby, but you get the idea
$('p', c).each(function(i) { // had to trim it to remove whitespaces from start/end.
if ($(this).children('img').length) return; // skip if it's just an image.
if ($.trim($(this).text()).length > 3)
$(this).append("<div class='num'>"+ (n += 1) +"</div>");
})
欢迎其他方法!
示例诗(http://allpoetry.com/poem/7429983-the_many_endings-by-Kevin)
<p>
from the other side of silence<br>
you met me with change and a pocket<br>
of unhappy apples.</p>
<p>
</p>
<p>
<br>
we bled together to black<br>
and chose the path carefully to<br>
france.<br><br>
sometimes when you smile<br>
your radiant footsteps fall<br>
and all around us is silence:<br>
each dream step is<br>
false but full of such glory</p>
<p>
</p>
<p>
<br>
unhappiness never made a student of you:<br>
just two by two by two. now three<br>
this great we that overflows our<br>
heart-cave<br><br>
each jewel-like addition to the delicate<br>
crown. but flowers fall and dreams,<br>
all dreams, come to and end with death.</p>
谢谢!
答案 0 :(得分:0)
对于后人来说,这就是我现在正在使用的内容,它似乎非常准确。非拉丁字符有时会从ckeditor引起一些问题,所以我现在正在剥离它们。
html = Nokogiri::HTML(rendered)
text = html.at('body').inner_text rescue nil
return self.words = rendered.gsub(/<p> <\/p>/,'').gsub(/<p><br\s?\/?>|(?:<br\s?\/?>){2,}/,'<br>').scan(/<br>|<br \/>|<p/).size+1 if !text
#bonus points to strip lines entirely non-letter. idk
#d "text is", text.gsub!(/([\x09|\x0D|\t])|(\xc2\xa0){1,}|[^A-z]/u,'')
text.gsub!(/[^A-z\n]/u,'')
#d "text is", text
self.words = text.strip.scan(/(\s*\n\s*)+/).size+1