正则表达式在某些文本之前提取数据

时间:2017-04-13 14:11:20

标签: regex excel vba excel-vba

我有大型文本文档,其中包含一些我想要提取的数据。

正如您在屏幕截图中看到的,我想将 A040 提取到文件名旁边的Excel列。

A040 之前,总有三个空格而不是文字工作表(也在截图中)

每个文件都有不同的编号,并且总是字母A有三个数字和文本表。 - >上传示例文件:

我已经在VB中使用了Excel,但它无法正常工作。

Dim cell As Range
    Dim rng As Range
    Dim output As String

    Set rng = ws.Range("A1", ws.Range("A1").SpecialCells(xlLastCell).Address)

    For Each cell In rng

        On Error Resume Next
        output = ExtA(cell.Value)

        If Len(output) > 0 Then
            Range("B" & j) = output
            Exit For
        End If
    Next

    j = j + 1
    ws.Cells.ClearContents
    'Call DelConns

    strFileName = Dir 'next file

Loop
End Sub

Function ExtA(ByVal text As String) As String
'REGEX Match VBA in excel

Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")

RE.Pattern = "(?<=Sheet)[^Sheet]*\ Sheet"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)

If allMatches.Count <> 0 Then
    result = allMatches.Item(0).submatches.Item(0)
End If

ExtA = result

End Function

3 个答案:

答案 0 :(得分:3)

This seems to work on your sample.

Option Explicit

Function AthreeDigits(str As String)
    Dim n As Long, nums() As Variant
    Static rgx As Object, cmat As Object

    'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
    If rgx Is Nothing Then
        Set rgx = CreateObject("VBScript.RegExp")
    Else
        Set cmat = Nothing
    End If
    AthreeDigits = vbNullString

    With rgx
        .Global = False
        .MultiLine = True
        .Pattern = "\A[0-9]{3}[\s]{3}Sheet"
        If .Test(str) Then
            Set cmat = .Execute(str)
            AthreeDigits = Left(cmat.Item(0), 4)
        End If
    End With
End Function

答案 1 :(得分:2)

你的意思是说在之后有4个空格 A040之前和#34;之前#34;?如果是这样,请尝试以下模式:

.pattern = "(A\d\d\d)\s{3}Sheet"
编辑:我以为你说了4个空格,但是你说3.我的模式现在反映了这一点。 编辑2 :(我需要更多咖啡!)将\ b更改为\ s。

答案 2 :(得分:0)

请参阅此处的示例

"\s+[Aa]\d*\s+Sheet"

\s+[Aa]\d*\s+(Sheet)

[Aa]\d*\s+(Sheet)

演示
https://regex101.com/r/Qo8iUf/3

  

enter image description here

\s+ 匹配任何空格字符(等于 [\r\n\t\f\v ]
+ 量词 - 在一次和无限次之间匹配,尽可能多次匹配
Aa 匹配列表中的单个字符 Aa (区分大小写)
\d* 匹配一个数字(等于 [0-9]
* 量词 - 在零和无限次之间匹配,尽可能多次

  

enter image description here