主要寻找一些指针和一些代码。我的任务是搜索不同字符串的大量文件,并创建匹配日志。
最初我正在解析每个文件,寻找单个字符串但是一旦我有数千个文件大约1MB,它就太慢了。因此,我想尝试打开每个文件一次并扫描文件中的多个字符串,将它们归入日志中的各种规则。
我创建了以下规则文件:
{"Logs": {
"Component":
{
"Files":[
{
"name": "test.txt",
"encoding": "UTF8",
"rules":[{
"Rule1":"this is text"
}]
},
{
"name": "test2.txt",
"encoding": "UTF8",
"rules":[{
"Rule2": "this is text1",
"Rule3": "this is text3"
}]
}
]
}
}}
可能需要改进并且可以改变。以下Powershell使用该规则搜索文件:
Function ParseFile($Files){
write-host "Parsing file" $Files.Name "for text " $Files.rules
Get-ChildItem "." -Recurse -Filter $Files.Name |
Foreach-Object {
write-host $_.FullName
Foreach($line in Get-Content $_.FullName -encoding $Files.encoding ) {
##Check if the current line from file matches a rule from the $Files.Rules array.
##If so log the file, line and rule ID to a CSV file. E.g.:
##RuleID, RuleString, LineFromFile, FileName
}
}
}
$JSON = Get-Content -Raw -Path rule.json | ConvertFrom-Json
foreach ($files in $JSON.Logs.Component.Files ){
write-host $files.name
write-host "============================="
ParseFile $files
}
上述搜索和分类的最快方式是否有意义? 我不确定如何处理评论部分。我假设$ line -in $ Files.rules但我不认为这个阵列非常适合。
任何建议都欢迎并提前致谢。
答案 0 :(得分:1)
这是使用正则表达式的替代方案。我修改了JSON以便于解析。如果需要,可以使用$_.rules.psobject.properties
中的名称和值属性获取RuleID和RuleString,从而可以使用原始JSON。
此解决方案要求RuleID
为单字。
rules.json
{"Logs": {
"Component":
{
"Files":[
{
"name": "test.txt",
"encoding": "UTF8",
"rules":[{
"RuleID": "Rule1",
"Rule": "this is text"
}]
},
{
"name": "test2.txt",
"encoding": "UTF8",
"rules":[
{
"RuleID": "Rule2",
"Rule": "this is text1"
},
{
"RuleID": "Rule3",
"Rule": "this is text3"
}
]
}
]
}
}}
代码:
$JSON.Logs.Component.Files | ForEach-Object {
$item = $_
#Create regex-pattern
$pattern = ($item.rules | ForEach-Object { "(?'$($_.RuleID)'$([regex]::Escape($_.Rule)))" }) -join '|'
#Find matching files
Get-ChildItem -Path "." -Recurse -Filter $item.Name |
Select-String -Pattern $pattern -Encoding $item.Encoding -AllMatches |
ForEach-Object {
$MatchedRule = $_.Matches.Groups | Where-Object { $_.Name -ne '0' -and $_.Success }
New-Object -TypeName psobject -Property @{
RuleID = $MatchedRule.Name
RuleString = $MatchedRule.Value
LineFromFile = $_.Line
FileName = $_.Path
}
}
} | Export-Csv -Path results.csv -NoTypeInformation -Encoding UTF8
results.csv:
"FileName","LineFromFile","RuleID","RuleString"
"D:\New folder\test.txt","foo this is text1 bar","Rule1","this is text"
"D:\New folder\test.txt","this is text3ss","Rule1","this is text"
"D:\New folder\test2.txt","foo this is text1 bar","Rule2","this is text1"
"D:\New folder\Test\test2.txt","this is text3ss","Rule3","this is text3"
答案 1 :(得分:0)
我稍微调整了你的JSON:
{"Logs": {
"Component":
{
"Files":[
{
"name": "test.txt",
"encoding": "UTF8",
"rules":["this is text"
]
},
{
"name": "test2.txt",
"encoding": "UTF8",
"rules":["this is text1",
"this is text3"
]
}
]
}
}}
使用这个,这是一个可能的解决方案:
$JSON = Get-Content -Raw -Path rules.json | ConvertFrom-Json
$JSON.Logs.Component.Files |
ForEach-Object {
$fileName = $_.Name
$rules = $_.rules
Get-Content $fileName -encoding $_.encoding |
ForEach-Object {
for($i=0;$i -lt $rules.Count;$i++)
{
if($_ -like "*$($rules[$i])*")
{
[PsCustomObject]@{RuleNumber = ($i+1);
RuleString = $rules[$i];
MatchingText = $_;
File = $filename} |
Export-Csv matches.csv -Append -NoTypeInformation
}
}
}
}