我使用以下代码将字符串转换为SENTENCE Case。
var sentenceRegex = new Regex(@"(^[a-z])|[?!.:;]\s+(.)", RegexOptions.ExplicitCapture);
var result = sentenceRegex.Replace(toConvert.ToLower(), s => s.Value.ToUpper());
但是,如果Sentence以HTML_TAGS开头,它会失败,如下例所示。
我想跳过HTML标签并将文本转换为SENTENCE CASE。 当前文字:
<BOLD_HTML_TAG>lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the printing and typesetting industry.
<PARAGRAPH_TAG>LOREM ipsum has been the industry's standard dummy
textever since the 1500s</PARAGRAPH_TAG>.
句子套管输出后应如下:
<BOLD_HTML_TAG>Lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the
printing and typesetting industry. <PARAGRAPH_TAG>Lorem ipsum has been
the industry's standard dummy textever since the
1500s</PARAGRAPH_TAG>.
如果有人能帮助我使用正则表达式来帮助我忽略(不删除)字符串中的HTML标记并将字符串转换为SENTENCE CASE,我将不胜感激。
答案 0 :(得分:0)
可能不漂亮,但它有效;)
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string toConvert = "<BOLD_HTML_TAG>lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the printing and typesetting industry."+
"<PARAGRAPH_TAG>LOREM ipsum has been the industry's standard dummy "+
"text ever since the 1500s</PARAGRAPH_TAG>.";
var sentenceRegex = new Regex(@"(?<=<(?<tag>\w+)>).*?(?=</\k<tag>>)", RegexOptions.ExplicitCapture);
var result = sentenceRegex.Replace(toConvert, s => s.Value.Substring(0,1).ToUpper()+s.Value.ToLower().Substring(1));
Console.WriteLine(toConvert + "\r\n" + result);
}
}
正则表达式使用lookbehind和lookahead中的命名组匹配标记,然后提取字符串,最后将第一个字母转到上面,其余字母转到下面。
此致