在C#中选择N个最佳句子中的最佳句子

时间:2015-10-01 05:21:58

标签: c# regex streamreader

我有文本文件,其中文本排列明智。 每个句子连续重复3次。每组句子之间的唯一区别是标签出现次数。标签是[loc],[Time]和[PER]。考虑下面的例子。

Match between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at [Time]12 o'clock[/TIME]  
Match between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at 12 o'clock   
Match between [Loc]India[/Loc] and Seri Lanka will start at [Time]12 o'clock[/TIME]  
[PER]Dhoni[/PER] will lead Indian Team  
[PER]Dhoni[/PER] will lead [PER]Indian Team[/PER]  
Dhoni  will lead Indian Team  

我的目标是在每组具有最大标签数量的句子中选择那些句子。例如在第1组第1句中有总数 三个标签[Loc],[loc]和[Time]类似于第二组中的句子2 我尝试使用StreamReader,但我无法跳过句子。

1 个答案:

答案 0 :(得分:0)

如果标签实际出现在句子中,那么就进行计数。

(我在VB.net中写这个,但语法基本相同)

Dim sen1 as String = between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at [Time]12 o'clock[/TIME] 
Dim split as String() = sen1.split("/"c)
Dim senCount as Integer = split.count

在那里,现在您知道句子1有3个标签。

你甚至可以把它变成这样的函数 -

Private function tagCount(byval thisSentence As String)
     Dim split as String() = thisSentence.split("/"c)
     Dim senCount as Integer = split.count
     Return senCount
End Function

然后遍历你的句子以确定哪个标签最多。