Question

如何编写正则表达式以删除某些特定标签后的句子？

例如我在richtextbox中的文字

a   00001740    0.125   0   able#1  (usually followed by `to') having the necessary means or skill or know-how or authority to do something; "able to swim"; "she was able to program her computer"; "we were at last able to buy a car"; "able to get a grant for the project"
a   00002098    0   0.75    unable#1    (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
a   00002312    0   0   dorsal#2 abaxial#1  facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"

本文来自sentiwordnet。我想在第五个标签之后删除句子，比如单词能够＃1句子应该被省略（即它的光泽度）然后在另一个单词无法＃1之后它的光泽应该被省略。

它的正则表达式将删除sentiwordnet文本文件中单词的光泽度。有没有办法做到这一点，或者有人能为我做一点样本/无效吗？

输出应该是这样的：

a   00001740    0.125   0   able#1
a   00002098    0   0.75    unable#1
a   00002312    0   0   dorsal#2 abaxial#1

Answer 1

你可以改为寻找＃后跟数字......所以正则表达式是

(?<=#\d+)[^#]*$

除了＃

之外，

[^#]*会匹配0到多个字符

(?<=#\d+)会在匹配[^#]*

之前检查特定模式（＃后跟数字）是否出现

$描述了字符串

的结尾

或

\t[^\t]+$

您可以使用正则表达式的替换功能

input=Regex.Replace(input,regex,"");

Answer 2

这应该做的工作

string text = @"a   00001740    0.125   0   able#1  (usually followed by `to') having the necessary means or skill or know-how or... ";

string res = Regex.Replace(text, @"((?:[^\t]+\t){5}).+$", "$1");

C＃正则表达式删除一些特定标签后的句子

2 个答案: