C#:替换HTML字符串中的第一个普通字符

时间:2014-06-26 06:45:24

标签: c# html string replace

我要做的是将HTML字符串中的第一个字符替换为我自己的带有新标记的自定义样式。不幸的是,我无法以一般的方式为我的所有例子工作。

考虑下一个可能的HTML字符串:

string str1 = "hello world";
string str2 = "<p><div>hello</div> world <div>some text</div></p>";
string str3 = "<p>hello <span>world</span></p>";
string str4 = "<p><a href="#">h</a>hello world</p>";
string str5 = "<p>hello world <div>some text</div></p>";

结果应该是:

str1 = "<span class=\"my-style\">h</span>ello world";
str2 = "<p><div><span class=\"my-style\">h</span>ello</div> world <div>some text</div></p>";
str3 = "<p><span class=\"my-style\">h</span>ello <span>world</span></p>";
str4 = "<p><a href="#'><span class=\"my-style\">h</span></a>hello world</p>";
str5 = "<p><span class=\"my-style\">h</span>ello world <div>some text</div></p>";

在结果中&#39; h&#39;信已更改为<span class=\"my-style\">h</span>

有人可以帮助我吗?

2 个答案:

答案 0 :(得分:0)

:首字母选择器可以帮助您使用CSS执行此操作。 http://www.w3schools.com/cssref/sel_firstletter.asp

答案 1 :(得分:0)

您可以使用以下两种方法。首先提取innertext的第一个单词:

private static string ExtractHtmlInnerTextFirstWord(string htmlText)
{
    //Match any Html tag (opening or closing tags) 
    // followed by any successive whitespaces
    //consider the Html text as a single line

    Regex regex = new Regex("(<.*?>\\s*)+", RegexOptions.Singleline);

    // replace all html tags (and consequtive whitespaces) by spaces
    // trim the first and last space

    string resultText = regex.Replace(htmlText, " ").Trim().Split(' ').FirstOrDefault();

    return resultText;
}

注意:http://www.codeproject.com/Tips/477066/Extract-inner-text-from-HTML-using-Regex

的学分

然后,您将第一个单词替换为已编辑的值(也称为ExtractHtmlInnerTextFirstWord

private static string ReplaceHtmlInnerText(string htmlText)
{
    // Get first word.
    string firstWord = ExtractHtmlInnerTextFirstWord(htmlText);

    // Add span around first character of first word.
    string replacedFirstWord = firstWord.Replace(firstWord[0].ToString(), "<span class=\"my-style\">" + firstWord[0] +"</span>");

    // Replace only first occurrence of word.
    var regex = new Regex(Regex.Escape(firstWord));
    string replacedText = regex.Replace(htmlText, replacedFirstWord, 1);

    return replacedText;
}

您可以使用以下方法调用该方法:

private static void Main(string[] args)
{
    string str1 = "hello world";
    string str2 = "<p><div>hello</div> world <div>some text</div></p>";
    Console.WriteLine("Original: " + str1);
    Console.WriteLine("Edited value: " + ReplaceHtmlInnerText(str1));
    Console.WriteLine("Original: " + str2);
    Console.WriteLine("Edited value: " + ReplaceHtmlInnerText(str2));
    Console.Read();
}

输出:

Original: hello world 
Edited value: <span class="my-style">h</span>ello world 
Original: <p><div>hello</div> world <div>some text</div></p> 
Edited value: <p><div><span class="my-style">h</span>ello</div> world <div>some text</div></p>