我有我的代码,可以搜索包含单词Data_ID
的PDF文档的每个PDF页面。
此文档位于此PDF文档的其他每一页上,其更改方式如下:
data_id 400M549822
data_id 400M549233
ETC ..
所以现在我的控制台一直在返回它找到字符串data_id
的所有时间,但是我也希望它在它之后返回那些字符...
这是我到目前为止所拥有的:
Imports Bytescout.PDFExtractor
Imports System.IO
Imports System.Text.RegularExpressions
Module Module1
Class PageType
Property Identifier As String
End Class
Sub Main()
Dim direcory = "C:\Users\XBorja.RESURGENCE\Desktop\one main\"
Dim pageTypes As New List(Of PageType)
Dim ids = "data_id"
Dim resultstring As String
resultstring = Regex.Match(ids, "(?<=^.{1}).*(?=.{5}$)").Value
Dim currentPageTypeName = "unknown"
For Each inputfile As String In Directory.GetFiles(direcory)
For i = 0 To ids.Length - 1
pageTypes.Add(New PageType With {.Identifier = ids(i)})
Next
Dim extractor As New TextExtractor()
extractor.LoadDocumentFromFile(inputfile)
Dim pageCount = extractor.GetPageCount()
For i = 0 To pageCount - 1
' ' Find the type of the current page
' ' If it is not present on the page, then the last one found will be used.
For Each pt In pageTypes
Console.WriteLine(resultstring)
Next
Next
Next
End Sub
End Module
resultstring
是我试图与正则表达式一起使用的内容,但它只是计算data_id
中的位置,而不是其后的位置。
那么我该怎么做,以使其在单词data_id
后面返回以下10个字符(不包括空格)?
答案 0 :(得分:1)
返回11个字符,并在前面加上空格:
'Dim ids = "data_id 400M549822"
Dim ids = "data_id 400M549233"
Dim resultstring = Regex.Match(ids, "(?<=data_id)(\s\w{10})$").Value
Console.WriteLine(resultstring)
输出:
400M549233
一些注意事项:
– ?<=
=积极回望
– \s
=一个空格
– \w{10}
= 10个字符,包括A-> Z,a-> z,0-> 9,_