我有一堆日志文件应该被解析,并从中提取一些信息 - 提取。 一个示例行(不幸的是,修剪敏感数据后的行看起来像xml):
<SerialNumber>xxxxxxxxx</SerialNumber><IP>X.X.X.X</IP><UserID>user@domain.com</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T02:42:59</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion>
我想获取ip(在ip标签中)和usermail(在userid标签之间)
我目前的“解决者”
$regex = "<UserID>"
$files = Get-ChildItem -path 'c:\path\*.log'
foreach ($infile in $files) {
$res = select-string -Path $infile -Pattern $regex -AllMatches {
$txt = $res[$res.count-1]
# get user
$pos1= $txt.line.IndexOf("<UserID>")
$pos2= $txt.line.IndexOf("</UserID>")
$Puser = $txt.Line.Substring($pos1+8,$pos2-$pos1-8)
....
}
它有效,但我想知道不同的方法会更好,想看看如何做到这一点 select-string -pattern ...
尝试了几个“GUI”正则表达式构建器,但我无法弄清楚如何选择所需的内容 感谢
PS:
之后的结果
$regex = '<IP>(.*)</IP>'
$res = select-string -Path $infile -Pattern $regex
$res
0312092535|cfg |4|00|DevUpdt|[LyncDeviceUpdateC::prepareAndSendRequest] '<?xml version="1.0" encoding="utf-8"?><Request><DeviceType>3PIP</DeviceType><MacAddress>11-11-11-11-11-11</MacAddress><SerialNumber>111111111111</SerialNumber><IP>10.1.1.1</IP><UserID>user@domain.com</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T09:25:35</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion><Major>5</Major><M
日志文件样本(100Kb +)
0312104211|nisvc|2|00|Invoker's nCommands,CurrentKey:2,(106)Responder
0312104211|nisvc|2|00|Response(-1)nisvc,(-1),(-1)app,(22),(Expiry,TransactionId,Time,Type):(-1,-1,1520844131,1)IndicationCode:(400)
0312104211|app1 |5|00|[CWPADServiceEwsRsp::execute] PAC file failed with ''
0312104301|cfg |4|00|DevUpdt|[LyncDeviceUpdateC::prepareAndSendRequest] '<?xml version="1.0" encoding="utf-8"?><Request><DeviceType>3PIP</DeviceType><MacAddress>11-11-11-11-11-11</MacAddress><SerialNumber>64167F2A8451</SerialNumber><IP>10.1.1.1</IP><UserID>user@domain.com</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T10:43:00</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion><Major>5</Major><Minor>
0312104301|nisvc|2|00|Request(-1)nisvc,(701)NIServiceHttpReqMsgKey,(-1)proxy,(1001)AuthRsp,(Expiry,TransactionId,Time,Type):(45000,1306758696,1520844181,0)IndicationLevel:(200)
答案 0 :(得分:1)
此代码将获取所有文件,逐行读取每个文件并使用用户和ip创建对象并将它们放在一个数组中。
[regex]$ipUserReg = '(?<=<IP>)(.*)(?:<\/IP><UserID>)(.*)(?=<\/UserID>)'
$files = Get-ChildItem $path -filter *.log
$users = @(
foreach ($fileToSearch in $files) {
$file = [System.IO.File]::OpenText($fileToSearch)
while (!$file.EndOfStream) {
$text = $file.ReadLine()
if ($ipUserReg.Matches($text).Success -or $userReg.Matches($text).Success) {
New-Object psobject -Property @{
IP = $ipUserReg.Matches($text).Groups[1].Value
User = $ipUserReg.Matches($text).Groups[2].Value
}
}
}
$file.Close()
})
为了构建我的正则表达式,我经常使用regexr.com,但请注意,对于某些正则表达式,powershell略有不同。
编辑:以下是使用select-string而不是逐行阅读的示例:
[regex]$ipUserReg = '(?<=<IP>)(.*)(?:<\/IP><UserID>)(.*)(?=<\/UserID>)'
$files = Get-ChildItem $path -filter *.log
$users = @(
foreach ($fileToSearch in $files) {
Select-String -Path $fileToSearch.FullName -Pattern $ipUserReg -AllMatches | ForEach-Object {
$_.Matches | ForEach-Object{
New-Object psobject -property @{
IP = $_.Groups[1].Value
User = $_.Groups[2].Value
}
}
}
}
)