Question

我正在构建一个程序，通过扫描其标题页并使用OCR来获取该书的出版商...因为出版商总是位于标题页的底部我认为由空格分隔的检测行是一个解决方案，但我不知道如何测试。这是我的代码：

Dim builder As New StringBuilder()
Dim reader As New StringReader(txtOCR.Text)
Dim iCounter As Integer = 0
While True
    Dim line As String = reader.ReadLine()
    If line Is Nothing Then Exit While

    'i want to put the condition here

End While
txtPublisher.Text = builder.ToString()

Answer 1

你的意思是空行吗？然后你可以这样做：

Dim bEmpty As Boolean

然后在循环中：

If line.Trim().Length = 0 Then
    bEmpty = True
Else
    If bEmpty Then
        '...
    End If

    bEmpty = False
End If

Answer 2

为什么不执行以下操作：从底部开始，直到找到第一个非空行（不知道OCR如何工作...也许最底部的行总是非空的，在这种情况下这是多余的）。在下一步中，直到第一个空行。中间的文字是出版商。

您不需要StringReader：

Dim lines As String() = txtOCR.Text.Split(Environment.NewLine)
Dim bottom As Integer = lines.Length - 1

' Find bottom-most non-empty line.
Do While String.IsNullOrWhitespace(lines(bottom))
    bottom -= 1
Loop

' Find empty line above that
Dim top As Integer = bottom - 1

Do Until String.IsNullOrWhitespace(lines(top))
    top -= 1
Loop

Dim publisherSubset As New String(bottom - top)()
Array.Copy(lines, top + 1, publisherSubset, 0, bottom - top)
Dim publisher As String = String.Join(Environment.NewLine, publisherSubset)

但说实话，我不认为这是一个特别好的方法。它不灵活，不能很好地应对意外的格式化。我会使用正则表达式来描述发布者字符串（及其上下文）的外观。甚至可能这还不够，你必须考虑一下整个页面，以推断出哪些位是出版商。

Answer 3

假设发布者总是在最后一行，并且总是在空行之后。那么也许以下呢？

    Dim Lines as New List(Of String)
    Dim currentLine as String = ""
    Dim previousLine as String = ""

    Using reader As StreamReader = New StreamReader(txtOCR.Txt)
    currentLine = reader.ReadLine
     If String.IsNullOrWhiteSpace(previousLine) then lines.Add(currentLine)
     previousLine = currentLine
    End Using

    txtPublisher.Text = lines.LastOrDefault()

要忽略前一行是否为空/空：

Dim Lines as New List(Of String) 
Using reader As StreamReader = New StreamReader(txtOCR.Txt) 
lines.Add(reader.ReadLine) 
End Using 

txtPublisher.Text = lines.LastOrDefault()

如何检查字符串中的行是否用空格分隔？

3 个答案: