Question

我使用以下代码将字符串转换为SENTENCE Case。

var sentenceRegex = new Regex(@"(^[a-z])|[?!.:;]\s+(.)", RegexOptions.ExplicitCapture);
var result = sentenceRegex.Replace(toConvert.ToLower(), s => s.Value.ToUpper());

但是，如果Sentence以HTML_TAGS开头，它会失败，如下例所示。

我想跳过HTML标签并将文本转换为SENTENCE CASE。 当前文字：

<BOLD_HTML_TAG>lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the printing and typesetting industry.
<PARAGRAPH_TAG>LOREM ipsum has been the industry's standard dummy
textever since the 1500s</PARAGRAPH_TAG>.

句子套管输出后应如下：

<BOLD_HTML_TAG>Lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the
printing and typesetting industry. <PARAGRAPH_TAG>Lorem ipsum has been
the industry's standard dummy textever since the
1500s</PARAGRAPH_TAG>.

如果有人能帮助我使用正则表达式来帮助我忽略（不删除）字符串中的HTML标记并将字符串转换为SENTENCE CASE，我将不胜感激。

Answer 1

可能不漂亮，但它有效;）

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string toConvert = "<BOLD_HTML_TAG>lorem ipsum is simply dummy</BOLD_HTML_TAG> text of the printing and typesetting industry."+
                "<PARAGRAPH_TAG>LOREM ipsum has been the industry's standard dummy "+
                "text ever since the 1500s</PARAGRAPH_TAG>.";
        var sentenceRegex = new Regex(@"(?<=<(?<tag>\w+)>).*?(?=</\k<tag>>)", RegexOptions.ExplicitCapture);
        var result = sentenceRegex.Replace(toConvert, s => s.Value.Substring(0,1).ToUpper()+s.Value.ToLower().Substring(1));

        Console.WriteLine(toConvert + "\r\n" + result);
    }
}

正则表达式使用lookbehind和lookahead中的命名组匹配标记，然后提取字符串，最后将第一个字母转到上面，其余字母转到下面。

此致

使用Regex和C＃转换为句子案例

1 个答案: