我需要一个正则表达式来解决以下问题(也欢迎链接到类似问题,相关教程等):
"__some_words_a_b___" => "__some words a b___"
"____" => "____"
"some___words" => "some words"
所以我希望用空格替换单词之间的下划线并保持前导和尾随下划线。我发现了这个:
^[ \t]+|[ \t]+$
我认为它最像是那样的。我将在jQuery,Java(stdlibs)和XSLT中使用它。
增加: 句子不一定以下划线开头或以下划线结尾。句子也可能根本不包含下划线。多个下划线应呈现为多个空格
祝你好运 Lasse Espeholt
答案 0 :(得分:3)
这应该适用于Javascript:
var newString = oldString.replace(/([^_].*?)_(?=[^_|^\s])/g,"$1 ");
编辑:如果你已经在字符串中有空格,可能需要添加以下内容:
var newString = oldString.replace(/([^_|\s].*?)_(?=[^_|^s])/g,"$1 ");
我忘记了任何其他边缘情况? :)哦,是的,另一个边缘案例。如果后跟空格(如换行符,行尾等),请保留结尾下划线。
编辑:如果单词> 1
之间的下划线数量的替代解决方案var arrayString = oldString.replace(/^(_+)(.*?)(_+)$/g,"$1;$2;$3");
var a = arrayString.split(";");
var newString = a[0]+a[1].replace(/_/g," ")+a[2];
答案 1 :(得分:1)
我认为使用正则表达式和字符串替换会更简单。这是Python的答案,因为我对jQuery,Java或XSLT不够熟悉:
import re
def mangle_string(string):
"""
Replace underscores between letters with spaces, leave leading and
trailing underscores alone.
"""
# Match a string that starts with zero or more underscores, followed by a
# non-underscore, followed by zero or more of any characters, followed by
# another non-underscore, followed by zero or more underscores, then the
# end of the string. If the string doesn't match that pattern, then return
# it unmodified.
m = re.search(r'^(_*)([^_]+.*[^_]+)(_*)$', string)
if not m:
return string
# Return the concatentation of first group (the leading underscores), then
# the middle group (everything else) with any internal underscores
# replaced with spaces, then the last group (the trailing underscores).
return m.group(1) + m.group(2).replace('_', ' ') + m.group(3)
答案 2 :(得分:0)
也许这就是你想要的(Javascript):
var newString = oldString.replace(/(\w)_(\w)/g, "$1 $2");
如果单词之间可以有许多下划线,那么:
var newString = oldString.replace(/(\w)_+(\w)/g, "$1 $2");
如果你想保留与下划线相同数量的空格:
var newString = oldString.replace(/(\w)(_+)(\w)/g, function(_, l1, u, l2) {
return l1 + (u.length == 1 ? ' ' : (new Array(u.length - 1).join(' '))) + l2;
});
答案 3 :(得分:0)
我不会为此使用RegEx。我将计算前导和尾随下划线,然后使用middle.replace('_',' ')
和尾随子串(如果有)加入前导子串(如果有)。如果前导下划线运行到最后,则立即返回原始字符串。