使用Regex和C#转换为句子案例

时间:2015-08-13 11:20:46

标签: c# regex string

我使用以下代码将字符串转换为SENTENCE Case。

var sentenceRegex = new Regex(@"(^[a-z])|[?!.:;]\s+(.)", RegexOptions.ExplicitCapture);
var result = sentenceRegex.Replace(toConvert.ToLower(), s => s.Value.ToUpper());

但是,如果Sentence以HTML_TAGS开头,它会失败,如下例所示。

我想跳过HTML标签并将文本转换为SENTENCE CASE。 当前文字:

<BOLD_HTML_TAG>lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the printing and typesetting industry.
<PARAGRAPH_TAG>LOREM ipsum has been the industry's standard dummy
textever since the 1500s</PARAGRAPH_TAG>.

句子套管输出后应如下:

<BOLD_HTML_TAG>Lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the
printing and typesetting industry. <PARAGRAPH_TAG>Lorem ipsum has been
the industry's standard dummy textever since the
1500s</PARAGRAPH_TAG>.

如果有人能帮助我使用正则表达式来帮助我忽略(不删除)字符串中的HTML标记并将字符串转换为SENTENCE CASE,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

可能不漂亮,但它有效;)

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string toConvert = "<BOLD_HTML_TAG>lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the printing and typesetting industry."+
                "<PARAGRAPH_TAG>LOREM ipsum has been the industry's standard dummy "+
                "text ever since the 1500s</PARAGRAPH_TAG>.";
        var sentenceRegex = new Regex(@"(?<=<(?<tag>\w+)>).*?(?=</\k<tag>>)", RegexOptions.ExplicitCapture);
        var result = sentenceRegex.Replace(toConvert, s => s.Value.Substring(0,1).ToUpper()+s.Value.ToLower().Substring(1));

        Console.WriteLine(toConvert + "\r\n" + result);
    }
}

正则表达式使用lookbehind和lookahead中的命名组匹配标记,然后提取字符串,最后将第一个字母转到上面,其余字母转到下面。

此致