我有一些包含缩写的网站内容。我有一个公认的网站缩写列表,以及他们的解释。我想创建一个正则表达式,这将允许我用一些标记替换内容中找到的所有已识别缩写。
例如:
内容:
This is just a little test of the memb to see if it gets picked up. Deb of course should also be caught here.
缩写:
memb = Member; deb = Debut;
结果:
This is just a little test of the [a title="Member"]memb[/a] to see if it gets picked up. [a title="Debut"]Deb[/a] of course should also be caught here.
(这只是简单的示例标记)。
感谢。
编辑:
CraigD的答案几乎就在那里,但也有问题。我只想匹配整个单词。我还想保持每个单词被替换的正确大写,以便deb仍然是deb,并且Deb仍然是原始文本的Deb。例如,此输入:
This is just a little test of the memb. And another memb, but not amemba. Deb of course should also be caught here.deb!
答案 0 :(得分:10)
首先,您需要Regex.Escape()
所有输入字符串。
然后你可以在字符串中查找它们,并用你想到的标记迭代地替换它们:
string abbr = "memb";
string word = "Member";
string pattern = String.Format("\b{0}\b", Regex.Escape(abbr));
string substitue = String.Format("[a title=\"{0}\"]{1}[/a]", word, abbr);
string output = Regex.Replace(input, pattern, substitue);
编辑:我问过一个简单的String.Replace()
是不够的 - 但我可以看出为什么正则表达式是可取的:你只能通过制作一个使用单词边界的模式来强制执行“全字”替换锚定器。
您可以从所有转义输入字符串构建单个模式,如下所示:
\b(?:{abbr_1}|{abbr_2}|{abbr_3}|{abbr_n})\b
然后使用match evaluator找到正确的替代品。这样就可以避免多次迭代输入字符串。
答案 1 :(得分:4)
不确定这会扩展到一个大单词列表,但我认为它应该提供你想要的输出(尽管在你的问题中'结果'似乎与'内容'相同)?
无论如何,请告诉我这是否是你所追求的
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var input = @"This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.";
var dictionary = new Dictionary<string,string>
{
{"memb", "Member"}
,{"deb","Debut"}
};
var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")";
foreach (Match metamatch in Regex.Matches(input
, regex /*@"(memb)|(deb)"*/
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
input = input.Replace(metamatch.Value, dictionary[metamatch.Value.ToLower()]);
}
Console.Write (input);
Console.ReadLine();
}
}
}
答案 2 :(得分:1)
我怀疑它的表现会比正常的string.replace更好,所以如果性能是关键的衡量标准(重构一下以使用编译的正则表达式)。您可以将正则表达式版本用作:
var abbrsWithPipes = "(abbr1|abbr2)";
var regex = new Regex(abbrsWithPipes);
return regex.Replace(html, m => GetReplaceForAbbr(m.Value));
您需要实现GetReplaceForAbbr,它接收匹配的特定abbr。
答案 3 :(得分:1)
我正在做的正是你在我的应用程序中寻找的东西,这对我有用: 参数str是你的内容:
public static string GetGlossaryString(string str)
{
List<string> glossaryWords = GetGlossaryItems();//this collection would contain your abbreviations; you could just make it a Dictionary so you can have the abbreviation-full term pairs and use them in the loop below
str = string.Format(" {0} ", str);//quick and dirty way to also search the first and last word in the content.
foreach (string word in glossaryWords)
str = Regex.Replace(str, "([\\W])(" + word + ")([\\W])", "$1<span class='glossaryItem'>$2</span>$3", RegexOptions.IgnoreCase);
return str.Trim();
}
答案 4 :(得分:1)
对于任何有兴趣的人,这是我的最终解决方案。它适用于.NET用户控件。它使用带有匹配评估器的单一模式,如Tomalak所建议的那样,因此没有foreach循环。这是一个优雅的解决方案,它为我提供了样本输入的正确输出,同时为匹配的字符串保留了正确的外壳。
public partial class Abbreviations : System.Web.UI.UserControl
{
private Dictionary<String, String> dictionary = DataHelper.GetAbbreviations();
protected void Page_Load(object sender, EventArgs e)
{
string input = "This is just a little test of the memb. And another memb, but not amemba to see if it gets picked up. Deb of course should also be caught here.deb!";
var regex = "\\b(?:" + String.Join("|", dictionary.Keys.ToArray()) + ")\\b";
MatchEvaluator myEvaluator = new MatchEvaluator(GetExplanationMarkup);
input = Regex.Replace(input, regex, myEvaluator, RegexOptions.IgnoreCase);
litContent.Text = input;
}
private string GetExplanationMarkup(Match m)
{
return string.Format("<b title='{0}'>{1}</b>", dictionary[m.Value.ToLower()], m.Value);
}
}
输出如下(下图)。请注意,它只匹配完整的单词,并且外壳将保留原始字符串:
This is just a little test of the <b title='Member'>memb</b>. And another <b title='Member'>memb</b>, but not amemba to see if it gets picked up. <b title='Debut'>Deb</b> of course should also be caught here.<b title='Debut'>deb</b>!