我有文本文件,其中文本排列明智。 每个句子连续重复3次。每组句子之间的唯一区别是标签出现次数。标签是[loc],[Time]和[PER]。考虑下面的例子。
Match between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at [Time]12 o'clock[/TIME]
Match between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at 12 o'clock
Match between [Loc]India[/Loc] and Seri Lanka will start at [Time]12 o'clock[/TIME]
[PER]Dhoni[/PER] will lead Indian Team
[PER]Dhoni[/PER] will lead [PER]Indian Team[/PER]
Dhoni will lead Indian Team
我的目标是在每组具有最大标签数量的句子中选择那些句子。例如在第1组第1句中有总数 三个标签[Loc],[loc]和[Time]类似于第二组中的句子2 我尝试使用StreamReader,但我无法跳过句子。
答案 0 :(得分:0)
如果标签实际出现在句子中,那么就进行计数。
(我在VB.net中写这个,但语法基本相同)
Dim sen1 as String = between [Loc]India[/Loc] and [Loc]Seri Lanka[/Loc] will start at [Time]12 o'clock[/TIME]
Dim split as String() = sen1.split("/"c)
Dim senCount as Integer = split.count
在那里,现在您知道句子1有3个标签。
你甚至可以把它变成这样的函数 -
Private function tagCount(byval thisSentence As String)
Dim split as String() = thisSentence.split("/"c)
Dim senCount as Integer = split.count
Return senCount
End Function
然后遍历你的句子以确定哪个标签最多。