Powershell:使用正则表达式从文本/文件中检索子字符串

时间:2018-03-12 13:04:50

标签: regex powershell

我有一堆日志文件应该被解析,并从中提取一些信息 - 提取。 一个示例行(不幸的是,修剪敏感数据后的行看起来像xml):

<SerialNumber>xxxxxxxxx</SerialNumber><IP>X.X.X.X</IP><UserID>user@domain.com</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T02:42:59</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion>

我想获取ip(在ip标签中)和usermail(在userid标签之间)

我目前的“解决者”

$regex = "<UserID>"

$files = Get-ChildItem -path 'c:\path\*.log'
foreach ($infile in $files) {
$res = select-string -Path $infile -Pattern $regex -AllMatches  {
$txt = $res[$res.count-1]

# get user
$pos1= $txt.line.IndexOf("<UserID>")
$pos2= $txt.line.IndexOf("</UserID>")
$Puser = $txt.Line.Substring($pos1+8,$pos2-$pos1-8)

....
}

它有效,但我想知道不同的方法会更好,想看看如何做到这一点 select-string -pattern ...

尝试了几个“GUI”正则表达式构建器,但我无法弄清楚如何选择所需的内容 感谢

PS:

之后的结果
$regex = '<IP>(.*)</IP>'
$res = select-string -Path $infile -Pattern $regex
$res

0312092535|cfg  |4|00|DevUpdt|[LyncDeviceUpdateC::prepareAndSendRequest] '<?xml version="1.0" encoding="utf-8"?><Request><DeviceType>3PIP</DeviceType><MacAddress>11-11-11-11-11-11</MacAddress><SerialNumber>111111111111</SerialNumber><IP>10.1.1.1</IP><UserID>user@domain.com</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T09:25:35</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion><Major>5</Major><M

日志文件样本(100Kb +)

0312104211|nisvc|2|00|Invoker's nCommands,CurrentKey:2,(106)Responder
0312104211|nisvc|2|00|Response(-1)nisvc,(-1),(-1)app,(22),(Expiry,TransactionId,Time,Type):(-1,-1,1520844131,1)IndicationCode:(400)
0312104211|app1 |5|00|[CWPADServiceEwsRsp::execute] PAC file failed with ''
0312104301|cfg  |4|00|DevUpdt|[LyncDeviceUpdateC::prepareAndSendRequest] '<?xml version="1.0" encoding="utf-8"?><Request><DeviceType>3PIP</DeviceType><MacAddress>11-11-11-11-11-11</MacAddress><SerialNumber>64167F2A8451</SerialNumber><IP>10.1.1.1</IP><UserID>user@domain.com</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T10:43:00</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion><Major>5</Major><Minor>
0312104301|nisvc|2|00|Request(-1)nisvc,(701)NIServiceHttpReqMsgKey,(-1)proxy,(1001)AuthRsp,(Expiry,TransactionId,Time,Type):(45000,1306758696,1520844181,0)IndicationLevel:(200)

1 个答案:

答案 0 :(得分:1)

此代码将获取所有文件,逐行读取每个文件并使用用户和ip创建对象并将它们放在一个数组中。

[regex]$ipUserReg = '(?<=<IP>)(.*)(?:<\/IP><UserID>)(.*)(?=<\/UserID>)'
$files = Get-ChildItem $path -filter *.log
$users = @(
    foreach ($fileToSearch in $files) {
        $file = [System.IO.File]::OpenText($fileToSearch)
        while (!$file.EndOfStream) {
            $text = $file.ReadLine()
            if ($ipUserReg.Matches($text).Success -or $userReg.Matches($text).Success) {
                New-Object psobject -Property @{
                    IP = $ipUserReg.Matches($text).Groups[1].Value
                    User = $ipUserReg.Matches($text).Groups[2].Value
                }
            }
        }
        $file.Close()
})

为了构建我的正则表达式,我经常使用regexr.com,但请注意,对于某些正则表达式,powershell略有不同。

编辑:以下是使用select-string而不是逐行阅读的示例:

[regex]$ipUserReg = '(?<=<IP>)(.*)(?:<\/IP><UserID>)(.*)(?=<\/UserID>)'
$files = Get-ChildItem $path -filter *.log
$users = @(
    foreach ($fileToSearch in $files) {
        Select-String -Path $fileToSearch.FullName -Pattern $ipUserReg -AllMatches | ForEach-Object {
            $_.Matches | ForEach-Object{
                New-Object psobject -property @{
                    IP = $_.Groups[1].Value
                    User = $_.Groups[2].Value
                }
            }
        }
    }
)