.txt使用REGEX

时间:2018-07-12 17:28:52

标签: powershell csv logging extraction

我问过这个问题,LotPings提出了一个完美的结果。与用户交谈时,与之相关的信息我一开始只得到一半!

现在确切地知道需要什么,我将再次说明该情况...

要记住的事情:

  • 终端将始终为A,后跟3位数字,即A123

  • 用户ID位于日志文件的顶部,仅出现一次,将始终以89开头并且为六位数。该行将始终以SELECTED FOR OPERATOR 89XXXX开始

  • 文件中有两个Date模式(一个是搜索日期,另一个是DOB),每个模式都需要提取到单独的列中。并非所有记录都有DOB,有些记录只有年份。

  • 查询者并非总是以'C'开头,而是需要整个程序行。

  • 搜索结果中始终带有“查询”,然后再进行提取。

这是日志文件

L      TRANSACTIONS LOGGED FROM 01/05/2018 0001 TO 31/05/2018 2359
        SELECTED FOR OPERATOR  891234

START                 TERMINAL    USER        ENQUIRER                    TERMINAL IP
========================================================================================================================
01/05/18 1603       A555        CART87565       46573 RBCO NPC SERVICES GW/10/0043                           
        SEARCH ENQUIRY               RECORD NO : S48456/06P     CHAPTER CODE =   
                                 RECORD DISPLAYED : S48853/98D

                                  PRINT REQUESTED : SINGLE RECORD
========================================================================================================================
03/05/18 1107       A555        CERT16574       BTD/54/1786 16475                                    
        REF ENQUIRY                  DHF ID : 58/94710W     CHAPTER CODE =   
                                 RECORD DISPLAYED : S585988/84H
========================================================================================================================
24/05/18 1015       A555        CERT15473       19625 CBRS DDS SERVICES NM/18/0199                           

        IMAGE ENQUIRY                      NAME : TREVOR SMITH CHAPTER CODE =  

                                    DATE OF BIRTH :   /  /1957
========================================================================================================================
24/05/18 1025       A555        CERT15473       15325 CBRS DDS SERVICES NM/12/0999                           
        REF ENQUIRY                  DDS ID : 04/102578R     CHAPTER CODE =  
========================================================================================================================

这是日志文件的示例,需要提取哪些内容以及在哪个标题下。

enter image description here

到这样的CSV

enter image description here

PowerShell脚本LotPings完美地完成了工作,我只需要从顶行中提取用户ID,以解决不是所有具有DOB的记录,并且查询不只一种,即引用查询,搜索查询,图像查询。

$FileIn   = '.\SO_51209341_data.txt'
$TodayCsv = '.\SO_51209341_data.csv'

$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<User>C[A-Z0-9]+) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+SEARCH REF\s+NAME : (?<Enquiry>.+?) (PAGE|CHAPTER) CODE ='
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'

$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''

$Csv = ForEach($Section in $Sections){
    $Row= @{} | Select-Object Date, Time, Terminal, User, Enquirer, Enquiry, DOB
    $Cnt = 0
    if ($Section -match $RE1) {
        ++$Cnt
        $Row.Date     = $Matches.Date
        $Row.Time     = $Matches.Time
        $Row.Terminal = $Matches.Terminal
        $Row.User     = $Matches.User
        $Row.Enquirer = $Matches.Enquirer.Trim()
    }
    if ($Section -match $RE2) {
        ++$Cnt
        $Row.Enquiry  = $Matches.Enquiry
    }
    if ($Section -match $RE3){
        ++$Cnt
        $Row.DOB      = $Matches.DOB
    }
    if ($Cnt -eq 3) {$Row}
}

$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation

1 个答案:

答案 0 :(得分:1)

有了如此精确的数据,第一个答案可能是:

## Q:\Test\2018\07\12\SO_51311417.ps1
$FileIn   = '.\SO_51311417_data.txt'
$TodayCsv = '.\SO_51311417_data.csv'

$RE0 = [RegEx]'SELECTED FOR OPERATOR\s+(?<UserID>\d{6})'
$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+(SEARCH|REF|IMAGE) ENQUIRY\s+(?<SearchResult>.+?)\s+(PAGE|CHAPTER) CODE'
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'

$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''
$UserID = "n/a"
$Csv = ForEach($Section in $Sections){
    If ($Section -match $RE0){
        $UserID = $Matches.UserID
    } Else {
        $Row= @{} | Select-Object Date,Time,Terminal,UserID,Enquirer,SearchResult,DOB
        $Cnt = 0
        If ($Section -match $RE1){
            $Row.Date     = $Matches.Date
            $Row.Time     = $Matches.Time
            $Row.Terminal = $Matches.Terminal
            $Row.Enquirer = $Matches.Enquirer.Trim()
            $Row.UserID   = $UserID
        }
        If ($Section -match $RE2){
            $Row.SearchResult  = $Matches.SearchResult
        }
        If ($Section -match $RE3){
            $Row.DOB      = $Matches.DOB
        }
        $Row
    }
}

$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation

样本输出

Date     Time Terminal UserID Enquirer                                           SearchResult           DOB
----     ---- -------- ------ --------                                           ------------           ---
01/05/18 1603 A555     891234 CART87565       46573 RBCO NPC SERVICES GW/10/0043 RECORD NO : S48456/06P
03/05/18 1107 A555     891234 CERT16574       BTD/54/1786 16475                  DHF ID : 58/94710W
24/05/18 1015 A555     891234 CERT15473       19625 CBRS DDS SERVICES NM/18/0199 NAME : TREVOR SMITH      /  /1957
24/05/18 1025 A555     891234 CERT15473       15325 CBRS DDS SERVICES NM/12/0999 DDS ID : 04/102578R