Question

我有我的代码，可以搜索包含单词Data_ID的PDF文档的每个PDF页面。

此文档位于此PDF文档的其他每一页上，其更改方式如下：

data_id 400M549822

data_id 400M549233

ETC ..

所以现在我的控制台一直在返回它找到字符串data_id的所有时间，但是我也希望它在它之后返回那些字符...

这是我到目前为止所拥有的：

Imports Bytescout.PDFExtractor
Imports System.IO
Imports System.Text.RegularExpressions

Module Module1
    Class PageType
        Property Identifier As String
    End Class

    Sub Main()
        Dim direcory = "C:\Users\XBorja.RESURGENCE\Desktop\one main\"
        Dim pageTypes As New List(Of PageType)
        Dim ids = "data_id"
        Dim resultstring As String
        resultstring = Regex.Match(ids, "(?<=^.{1}).*(?=.{5}$)").Value

        Dim currentPageTypeName = "unknown"

        For Each inputfile As String In Directory.GetFiles(direcory)
            For i = 0 To ids.Length - 1
                pageTypes.Add(New PageType With {.Identifier = ids(i)})
            Next

            Dim extractor As New TextExtractor()
            extractor.LoadDocumentFromFile(inputfile)
            Dim pageCount = extractor.GetPageCount()

            For i = 0 To pageCount - 1
                '        ' Find the type of the current page
                '        ' If it is not present on the page, then the last one found will be used.
                For Each pt In pageTypes
                    Console.WriteLine(resultstring)
                Next
            Next
        Next
    End Sub
End Module

resultstring是我试图与正则表达式一起使用的内容，但它只是计算data_id中的位置，而不是其后的位置。

那么我该怎么做，以使其在单词data_id后面返回以下10个字符（不包括空格）？

Answer 1

返回11个字符，并在前面加上空格：

'Dim ids = "data_id 400M549822"
Dim ids = "data_id 400M549233"
Dim resultstring = Regex.Match(ids, "(?<=data_id)(\s\w{10})$").Value
Console.WriteLine(resultstring)

输出：

 400M549233

一些注意事项：

– ?<= =积极回望
– \s =一个空格
– \w{10} = 10个字符，包括A-> Z，a-> z，0-> 9，_

在PDF中找到一个单词，然后在该单词之后返回11个字符？

1 个答案: