我正在构建一个程序,通过扫描其标题页并使用OCR来获取该书的出版商...因为出版商总是位于标题页的底部我认为由空格分隔的检测行是一个解决方案,但我不知道如何测试。这是我的代码:
Dim builder As New StringBuilder()
Dim reader As New StringReader(txtOCR.Text)
Dim iCounter As Integer = 0
While True
Dim line As String = reader.ReadLine()
If line Is Nothing Then Exit While
'i want to put the condition here
End While
txtPublisher.Text = builder.ToString()
答案 0 :(得分:2)
你的意思是空行吗?然后你可以这样做:
Dim bEmpty As Boolean
然后在循环中:
If line.Trim().Length = 0 Then
bEmpty = True
Else
If bEmpty Then
'...
End If
bEmpty = False
End If
答案 1 :(得分:1)
为什么不执行以下操作:从底部开始,直到找到第一个非空行(不知道OCR如何工作...也许最底部的行总是非空的,在这种情况下这是多余的)。在下一步中,直到第一个空行。中间的文字是出版商。
您不需要StringReader
:
Dim lines As String() = txtOCR.Text.Split(Environment.NewLine)
Dim bottom As Integer = lines.Length - 1
' Find bottom-most non-empty line.
Do While String.IsNullOrWhitespace(lines(bottom))
bottom -= 1
Loop
' Find empty line above that
Dim top As Integer = bottom - 1
Do Until String.IsNullOrWhitespace(lines(top))
top -= 1
Loop
Dim publisherSubset As New String(bottom - top)()
Array.Copy(lines, top + 1, publisherSubset, 0, bottom - top)
Dim publisher As String = String.Join(Environment.NewLine, publisherSubset)
但说实话,我不认为这是一个特别好的方法。它不灵活,不能很好地应对意外的格式化。我会使用正则表达式来描述发布者字符串(及其上下文)的外观。甚至可能这还不够,你必须考虑一下整个页面,以推断出哪些位是出版商。
答案 2 :(得分:1)
假设发布者总是在最后一行,并且总是在空行之后。那么也许以下呢?
Dim Lines as New List(Of String)
Dim currentLine as String = ""
Dim previousLine as String = ""
Using reader As StreamReader = New StreamReader(txtOCR.Txt)
currentLine = reader.ReadLine
If String.IsNullOrWhiteSpace(previousLine) then lines.Add(currentLine)
previousLine = currentLine
End Using
txtPublisher.Text = lines.LastOrDefault()
要忽略前一行是否为空/空:
Dim Lines as New List(Of String)
Using reader As StreamReader = New StreamReader(txtOCR.Txt)
lines.Add(reader.ReadLine)
End Using
txtPublisher.Text = lines.LastOrDefault()