C#正则表达式删除一些特定标签后的句子

时间:2013-03-06 14:25:39

标签: c# regex sentiment-analysis

如何编写正则表达式以删除某些特定标签后的句子?

例如我在richtextbox中的文字

a   00001740    0.125   0   able#1  (usually followed by `to') having the necessary means or skill or know-how or authority to do something; "able to swim"; "she was able to program her computer"; "we were at last able to buy a car"; "able to get a grant for the project"
a   00002098    0   0.75    unable#1    (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
a   00002312    0   0   dorsal#2 abaxial#1  facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"  

本文来自sentiwordnet。我想在第五个标签之后删除句子,比如单词能够#1句子应该被省略(即它的光泽度)然后在另一个单词无法#1之后它的光泽应该被省略。

它的正则表达式将删除sentiwordnet文本文件中单词的光泽度。有没有办法做到这一点,或者有人能为我做一点样本/无效吗?

输出应该是这样的:

a   00001740    0.125   0   able#1
a   00002098    0   0.75    unable#1
a   00002312    0   0   dorsal#2 abaxial#1

2 个答案:

答案 0 :(得分:0)

你可以改为寻找#后跟数字......所以正则表达式是

(?<=#\d+)[^#]*$
除了#

之外,

[^#]*会匹配0到多个字符

(?<=#\d+)会在匹配[^#]*

之前检查特定模式(#后跟数字)是否出现

$描述了字符串

的结尾

\t[^\t]+$

您可以使用正则表达式的替换功能

input=Regex.Replace(input,regex,"");

答案 1 :(得分:0)

这应该做的工作

string text = @"a   00001740    0.125   0   able#1  (usually followed by `to') having the necessary means or skill or know-how or... ";

string res = Regex.Replace(text, @"((?:[^\t]+\t){5}).+$", "$1");