Question

我有文本文件，其中文本排列明智。每个句子连续重复3次。每组句子之间的唯一区别是标签出现次数。标签是[loc]，[Time]和[PER]。考虑下面的例子。

Match between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at [Time]12 o'clock[/TIME]  
Match between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at 12 o'clock   
Match between [Loc]India[/Loc] and Seri Lanka will start at [Time]12 o'clock[/TIME]  
[PER]Dhoni[/PER] will lead Indian Team  
[PER]Dhoni[/PER] will lead [PER]Indian Team[/PER]  
Dhoni  will lead Indian Team

我的目标是在每组具有最大标签数量的句子中选择那些句子。例如在第1组第1句中有总数三个标签[Loc]，[loc]和[Time]类似于第二组中的句子2 我尝试使用StreamReader，但我无法跳过句子。

Answer 1

如果标签实际出现在句子中，那么就进行计数。

（我在VB.net中写这个，但语法基本相同）

Dim sen1 as String = between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at [Time]12 o'clock[/TIME] 
Dim split as String() = sen1.split("/"c)
Dim senCount as Integer = split.count

在那里，现在您知道句子1有3个标签。

你甚至可以把它变成这样的函数 -

Private function tagCount(byval thisSentence As String)
     Dim split as String() = thisSentence.split("/"c)
     Dim senCount as Integer = split.count
     Return senCount
End Function

然后遍历你的句子以确定哪个标签最多。

在C＃中选择N个最佳句子中的最佳句子

1 个答案: