Question

我试图从pdf中提取一些特定数据，我已经设法从pdf中提取文本并将其放入txt文件中。放置在文本文件中的数据是一个长行。我需要提取一行的特定部分。

如果它以'UK'开头并以' - - '结尾

我一直在尝试使用。

        Using read = New StreamReader(fName)
        Dim line As String = read.ReadToEnd
        If line.StartsWith(" UK") And line.Contains("- -") Then

        Else
            'do nothing
        End If

    End Using

Startswith不起作用，因为该行不以“UK”开头。我可以使用line.contains，因为它确实找到了英国，但该行包含多个' - - '实例。

我需要的部分如下所示

英国（0.6085）*（£）1.6435 -0.0062 0.8206 -0.0017 - -

我在MS Visual studio 2013中使用vb.net。

有人能提供一些帮助吗？

Answer 1

尝试使用Regex类：

Dim regex As New Regex("UK.*-\s?-\s?", RegexOptions.Singleline)
Dim match As Match = regex.Match(a)

If match.Success Then
    ' Do stuff
End If

在If ...中，然后你可以通过Match.Captures集合属性循环一系列匹配。

For Each c As Capture In result.Captures
    ' c.Value
Next

正则表达式是文本匹配，提取等的一个很好的工具。如果你做了相当多的事情，习惯使用它们。在使用代码之前，我发现RegexStudio在动态测试.NET Regex模式时非常方便。

Answer 2

StartsWith和EndsWith怎么样。

if (src.StartsWith("UK") AND src.EndsWith("- -")) Then
    'True
End If

Answer 3

简单的解决方案：

struct Point {
   var x:Float = 0
}

var p1 = Point()
var p2 = p1 //p1 and p2 share the same data under the hood
p2.x += 1 //p2 now has its own copy of the data

提取文本文件中一行的一部分

3 个答案: