Question

我有一个非常大的字符串（HTML），在这个HTML中有特殊的标记，其中所有标记都以“＃”开头，以“＃”结尾

简单例如

<html>
<body>
      <p>Hi #Name#, You should come and see this #PLACE# - From #SenderName#</p>
</body>
</html>

我需要一个能够检测这些令牌的代码并将其放入列表中。 0 - #Name＃ 1 - #Place＃ 2 - #SenderName＃

我知道我可以使用正则表达式，无论如何你有一些想法吗？

Answer 1

是的，您可以使用正则表达式。

string test = "Hi #Name#, You should come and see this #PLACE# - From #SenderName#";
Regex reg = new Regex(@"#\w+#");
foreach (Match match in reg.Matches(test))
{
    Console.WriteLine(match.Value);
}

您可能已经猜到了\ w表示任何字母数字字符。 +表示它可能出现1次或更多次。您可以在此处找到更多信息msdn doc（适用于.Net 4.您也可以在其中找到其他版本）。

Answer 2

您可以尝试：

// using System.Text.RegularExpressions;
// pattern = any number of arbitrary characters between #.
var pattern = @"#(.*?)#";
var matches = Regex.Matches(htmlString, pattern);

foreach (Match m in matches) {
    Console.WriteLine(m.Groups[1]);
}

答案受到this SO问题的启发。

Answer 3

如果您愿意，不含Regex的变体：

var splitstring = myHtmlString.Split('#');
var tokens = new List<string>();
for( int i = 1; i < splitstring.Length; i+=2){
  tokens.Add(splitstring[i]);
}

Answer 4

foreach (Match m in Regex.Matches(input, @"#\w+#"))
    Console.WriteLine("'{0}' found at index {1}.",  m.Value, m.Index);

Answer 5

试试这个

var result = html.Split('#')
                    .Select((s, i) => new {s, i})
                    .Where(p => p.i%2 == 1)
                    .Select(t => t.s);

说明：

第1行 - 我们将文字分为字符“＃”

line2 - 我们选择一个新的匿名类型，其中包括数组中的字符串位置，以及字符串本身

第3行 - 我们将匿名对象列表过滤到具有奇数索引值的列表 - 有效地选择“每隔一个”字符串 - 这适合查找包含在哈希字符中的字符串，而不是那些外部的字符串

line4 =我们剥离了索引器，只返回匿名类型

中的字符串

Answer 6

使用：

MatchCollection matches = Regex.Matches(mytext, @"#(\w+)#");

foreach(Match m in matches)
{
    Console.WriteLine(m.Groups[1].Value);
}

Answer 7

天真的解决方案：

var result = Regex
    .Matches(html, @"\#([^\#.]*)\#")
    .OfType<Match>()
    .Select(x => x.Groups[1].Value)
    .ToList();

Answer 8

Linq解决方案：

        string s = @"<p>Hi #Name#, 
          You should come and see this #PLACE# - From #SenderName#</p>";

        var result = s.Split('#').Where((x, y) => y % 2 != 0).Select(x => x);

Answer 9

将Regex.Matches方法与

之类的模式一起使用

#[^#]+#用于模式。

这可能是最天真的方式。

如果您希望避免在输出匹配中包含“＃”字符，则可能需要进行调整，可能需要进行调查：

(?<=#)[^#]+(?=#)

（这个匹配值是'你好'而不是'＃hello＃' - 所以你不需要做更多的修剪）

Answer 10

这会根据您的要求提供令牌列表：

var tokens = new List<string>();
var matches = new Regex("(#.*?#)").Matches(html);

foreach (Match m in matches) 
    tokens.Add(m.Groups[1].Value);

编辑：如果你不想要包含英镑字符，只需将它们移到Regex字符串的括号内（参见Pablo的回答）。

检测字符串中的特定标记。 C＃

10 个答案: