我需要创建一些Javascript,它可以从文本框中搜索输入的HTML,并忽略所有标签,以自动换行设置数字,如70,并添加<br>
标签。
我还需要找到所有ascii,如©
和–
,并将其计为一个空格而不是5或4个空格。
所以代码需要:
<b>Hello</b> Here is some code that I would like to wrap. Lets pretend this goes on for over 70 spaces.
输出将是:
<b>Hello</b> Here is some code that I would like to wrap. Lets pretend <br>
this goes on for over 70 spaces.
这可能吗?我该怎么开始?是否已经有了这个工具?
顺便说一下,CSS无法使用。
答案 0 :(得分:2)
虽然短语“正则表达式”和“解析HTML”的组合通常会导致entire universes to crumble,但您的用例似乎过于简单以至于它可以正常工作,但事实上您希望在换行后保留HTML格式在空格分隔的序列上工作要容易得多。以下是您想要做的非常粗略的近似:
input = "<b>Hello</b> Here is some code that I would like to wrap. Let's pretend this goes on for over 70 spaces. Better ¥€±, let's <em>make</em> it go on for more than 70, and pick üþ a whole <strong>buñ©h</strong> of crazy symbols along the way.";
words = input.split(' ');
lengths = [];
for (var i = 0; i < words.length; i++)
lengths.push(words[i].replace(/<.+>/g, '').replace(/&.+;/g, ' ').length);
line = [], offset = 0, output = [];
for (var i = 0; i < words.length; i ++) {
if (offset + (lengths[i] + line.length - 1) < 70) {
line.push(words[i]);
offset += lengths[i];
}
else {
output.push(line.join(' '));
offset = 0; line = [], i -= 1;;
}
if (i == words.length - 1)
output.push(line.join(' '));
}
output = output.join('<br />');
导致
Hello Here is some code that I would like to wrap. Let's pretend this
goes on for over 70 spaces. Better ¥€±, let's make it go on for more
than 70, and pick üþ a whole buñ©h of crazy symbols along the way.
请注意,HTML代码(b
,em
,strong
)会被保留,只是Markdown不会显示它们。
基本上,输入字符串在每个空格处被分成单词,这是天真的,可能会造成麻烦,但它是一个开始。然后,在删除任何类似于HTML标记或实体的内容之后计算每个单词的长度。然后,这是一个简单的问题,迭代每个单词,保持我们所在的列的运行记录;一旦我们达到70,我们将聚合的单词弹出到输出字符串并重置。同样,它非常粗糙,但它应该足以满足大多数基本HTML。
答案 1 :(得分:0)
此解决方案通过令牌计数“遍历”字符串令牌,直到所需的行长度。正则表达式捕获四种不同的令牌之一:
请注意,我已添加行终止符令牌,以防您的文本框已使用换行格式化(带有可选的回车符)。这是一个JavaScript函数,它使用String.replace()
遍历字符串,并使用匿名回调来计算令牌:
// Break up textarea into lines having len chars.
function breakupHTML(text, len) {
var re = /(<(?:[^'"<>]+|'[^']*'|"[^"]*")*>)|(&(?:\w+|#x[\da-f]+|#\d+);)|(\r?\n)|(.)/ig;
var count = 0; // Initialize line char count.
return text.replace(re,
function(m0, m1, m2, m3, m4) {
// Case 1: An HTML tag. Do not add to count.
if (m1) return m1;
// Case 2: An HTML entity. Add one to count.
if (m2) {
if (++count >= len) {
count = 0;
m2 += '<br>\n';
}
return m2;
}
// Case 3: A hard coded line terminator.
if (m3) {
count = 0;
return '<br>\n';
}
// Case 4: Any other single character.
if (m4) {
if (++count >= len) {
count = 0;
m4 += '<br>\n';
}
return m4;
} // Never get here.
});
}
以下是注释格式的正则表达式细分,以便您可以看到正在捕获的内容:
p = re.compile(r"""
# Match one HTML open/close tag, HTML entity or other char.
(<(?:[^'"<>]+|'[^']*'|"[^"]*")*>) # $1: HTML open/close tag
| (&(?:\w+|\#x[\da-f]+|\#\d+);) # $2: HTML entity.
| (\r?\n) # $3: Line terminator.
| (.) # $4: Any other character.
""", re.IGNORECASE | re.VERBOSE)
答案 2 :(得分:0)
不想释放Cthulhu,我决定(不同于我的答案)而是为您的问题提供答案,而不是尝试使用正则表达式解析HTML。相反,我转向了令人敬畏的力量,即jQuery,并用它来解析客户端的HTML。
工作小提琴:http://jsfiddle.net/CKQ9f/6/
html:
<div id="wordwrapOriginal">Here is some code that I would like to wrap. Lets pretend this goes on for over 70 spaces.etend this g<b class="foo bar">Helloend this goes on for over 70 spaces.etend</b>Here is some code that I would like to wrap. Lets pretend this goes on for over 70 spaces.etend this g</div>
<hr>
<div id="wordwrapResult"></div>
jQuery:
// lifted from here: https://stackoverflow.com/a/5259788/808921
$.fn.outerHTML = function() {
$t = $(this);
if( "outerHTML" in $t[0] )
{ return $t[0].outerHTML; }
else
{
var content = $t.wrap('<div></div>').parent().html();
$t.unwrap();
return content;
}
}
// takes plain strings (no markup) and adds <br> to
// them when each "line" has exceeded the maxLineLen
function breakLines(text, maxLineLen, startOffset)
{
var returnVals = {'text' : text, finalOffset : startOffset + text.length};
if (text.length + startOffset > maxLineLen)
{
var wrappedWords = "";
var wordsArr = text.split(' ');
var lineLen = startOffset;
for (var i = 0; i < wordsArr.length; i++)
{
if (wordsArr[i].length + lineLen > maxLineLen)
{
wrappedWords += '<br>';
lineLen = 0;
}
wrappedWords += (wordsArr[i] + ' ');
lineLen += (wordsArr[i].length + 1);
}
returnVals['text'] = wrappedWords.replace(/\s$/, '');
returnVals['finalOffset'] = lineLen;
}
return returnVals;
}
// recursive function which will traverse the "tree" of HTML
// elements under the baseElem, until it finds plain text; at which
// point, it will use the above function to add newlines to that text
function wrapHTML(baseElem, maxLineLen, startOffset)
{
var returnString = "";
var currentOffset = startOffset;
$(baseElem).contents().each(function () {
if (! $(this).contents().length) // plain text
{
var tmp = breakLines($(this).text(), maxLineLen, currentOffset);
returnString += tmp['text'];
currentOffset = tmp['finalOffset'];
}
else // markup
{
var markup = $(this).clone();
var tmp = wrapHTML(this, maxLineLen, currentOffset);
markup.html(tmp['html']);
returnString += $(markup).outerHTML();
currentOffset = tmp['finalOffset'];
}
});
return {'html': returnString, 'finalOffset': currentOffset};
}
$(function () {
wrappedHTML = wrapHTML("#wordwrapOriginal", 70, 0);
$("#wordwrapResult").html(wrappedHTML['html']);
});
请注意递归 - 使用正则表达式无法做到这一点!