我需要从文本旁边删除带有分隔符的单词。我已经删除了单词,但是不知道如何同时删除分隔符。有什么建议吗?
目前,我有:
static void Main(string[] args)
{
Program p = new Program();
string text = "";
text = p.ReadText("Duomenys.txt", text);
string[] wordsToDelete = { "Hello", "Thanks", "kinda" };
char[] separators = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\t' };
p.DeleteWordsFromText(text, wordsToDelete, separators);
}
public string ReadText(string file, string text)
{
text = File.ReadAllText(file);
return text;
}
public void DeleteWordsFromText(string text, string[] wordsToDelete, char[] separators)
{
Console.WriteLine(text);
for (int i = 0; i < wordsToDelete.Length; i++)
{
text = Regex.Replace(text, wordsToDelete[i], String.Empty);
}
Console.WriteLine("-------------------------------------------");
Console.WriteLine(text);
}
结果应为:
how are you?
I am good.
我有:
, how are you?
, I am . good.
Duomenys.txt
Hello, how are you?
Thanks, I am kinda. good.
答案 0 :(得分:2)
您可以按照以下方式构建正则表达式:
var regex = new Regex(@"\b("
+ string.Join("|", wordsToDelete.Select(Regex.Escape)) + ")("
+ string.Join("|", separators.Select(c => Regex.Escape(new string(c, 1)))) + ")?");
说明:
答案 1 :(得分:2)
您可以构建像这样的正则表达式
\b(?:Hello|Thanks|kinda)\b[ .,!?:;() ]*
其中\b(?:Hello|Thanks|kinda)\b
将与要删除的所有单词匹配为整个单词,而[ .,!?:;() ]*
将与要删除的单词相隔0次或更多次的所有分隔符。
char[] separators = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\t' };
string[] wordsToDelete = { "Hello", "Thanks", "kinda" };
string SepPattern = new String(separators).Replace(@"\", @"\\").Replace("^", @"\^").Replace("-", @"\-").Replace("]", @"\]");
var pattern = $@"\b(?:{string.Join("|", wordsToDelete.Select(Regex.Escape))})\b[{SepPattern}]*";
// => \b(?:Hello|Thanks|kinda)\b[ .,!?:;() ]*
Regex rx = new Regex(pattern, RegexOptions.Compiled);
// RegexOptions.IgnoreCase can be added to the above flags for case insensitive matching: RegexOptions.IgnoreCase | RegexOptions.Compiled
DeleteWordsFromText("Hello, how are you?", rx);
DeleteWordsFromText("Thanks, I am kinda. good.", rx);
这是DeleteWordsFromText
方法:
public static void DeleteWordsFromText(string text, Regex p)
{
Console.WriteLine($"---- {text} ----");
Console.WriteLine(p.Replace(text, ""));
}
输出:
---- Hello, how are you? ----
how are you?
---- Thanks, I am kinda. good. ----
I am good.
注释:
string SepPattern = new String(separators).Replace(@"\", @"\\").Replace("^", @"\^").Replace("-", @"\-").Replace("]", @"\]");
-这是一种分隔符模式,将在字符类中使用,并且由于仅^
,-
,\
,]
个字符需要在字符类中转义,只有这些字符被转义$@"\b(?:{string.Join("|", wordsToDelete.Select(Regex.Escape))})\b"
-这将建立要删除的单词的替代,并且仅将它们作为整个单词进行匹配。模式详细信息
\b
-单词边界(?:
-一个非捕获组的开始:
Hello
-Hello
字|
-或Thanks
-Thanls
字|
-或kinda
-kinda
字)
-组结束\b
-单词边界[ .,!?:;() ]*
-字符类中的任何0+个字符。请参见regex demo。
答案 2 :(得分:1)
我不会使用正则表达式。从现在开始的3个月内,您将不再对Regex有所了解,并且修复bug很难。
我会使用简单的循环。每个人都会明白:
BinaryHeap