Powershell - 基于规则快速搜索文件的方法

时间:2018-03-24 14:41:01

标签: powershell

主要寻找一些指针和一些代码。我的任务是搜索不同字符串的大量文件,并创建匹配日志。

最初我正在解析每个文件,寻找单个字符串但是一旦我有数千个文件大约1MB,它就太慢了。因此,我想尝试打开每个文件一次并扫描文件中的多个字符串,将它们归入日志中的各种规则。

我创建了以下规则文件:

{"Logs": {
   "Component":
   {
     "Files":[
       {
         "name": "test.txt",
         "encoding": "UTF8",
         "rules":[{
           "Rule1":"this is text"
           }]
       },
       {
         "name": "test2.txt",
         "encoding": "UTF8",
         "rules":[{
            "Rule2": "this is text1",
            "Rule3": "this is text3"
            }]
       }
     ]
   }
}}

可能需要改进并且可以改变。以下Powershell使用该规则搜索文件:

Function ParseFile($Files){
write-host "Parsing file" $Files.Name "for text " $Files.rules

 Get-ChildItem "." -Recurse -Filter $Files.Name | 
   Foreach-Object {
     write-host $_.FullName

     Foreach($line in Get-Content $_.FullName -encoding $Files.encoding ) {

     ##Check if the current line from file matches a rule from the $Files.Rules array.
     ##If so log the file, line and rule ID to a CSV file. E.g.:
     ##RuleID, RuleString, LineFromFile, FileName

     }
   } 
}

$JSON = Get-Content -Raw -Path rule.json | ConvertFrom-Json

foreach ($files in $JSON.Logs.Component.Files  ){
  write-host $files.name
  write-host "============================="
  ParseFile $files
}

上述搜索和分类的最快方式是否有意义? 我不确定如何处理评论部分。我假设$ line -in $ Files.rules但我不认为这个阵列非常适合。

任何建议都欢迎并提前致谢。

2 个答案:

答案 0 :(得分:1)

这是使用正则表达式的替代方案。我修改了JSON以便于解析。如果需要,可以使用$_.rules.psobject.properties中的名称和值属性获取RuleID和RuleString,从而可以使用原始JSON。

此解决方案要求RuleID为单字。

rules.json

{"Logs": {
    "Component":
    {
        "Files":[
        {
            "name": "test.txt",
            "encoding": "UTF8",
            "rules":[{
                "RuleID": "Rule1",
                "Rule": "this is text"
            }]
        },
        {
            "name": "test2.txt",
            "encoding": "UTF8",
            "rules":[
            {
                "RuleID": "Rule2",
                "Rule": "this is text1"
            },
            {
                "RuleID": "Rule3",
                "Rule": "this is text3"
            }
            ]
        }
        ]
    }
}}

代码:

$JSON.Logs.Component.Files | ForEach-Object {
    $item = $_

    #Create regex-pattern
    $pattern = ($item.rules | ForEach-Object { "(?'$($_.RuleID)'$([regex]::Escape($_.Rule)))" }) -join '|'

    #Find matching files
    Get-ChildItem -Path "." -Recurse -Filter $item.Name |
    Select-String -Pattern $pattern -Encoding $item.Encoding -AllMatches |
    ForEach-Object {

        $MatchedRule = $_.Matches.Groups | Where-Object { $_.Name -ne '0' -and $_.Success }

        New-Object -TypeName psobject -Property @{
            RuleID = $MatchedRule.Name
            RuleString = $MatchedRule.Value
            LineFromFile = $_.Line
            FileName = $_.Path
        }

    }
} | Export-Csv -Path results.csv -NoTypeInformation -Encoding UTF8

results.csv:

"FileName","LineFromFile","RuleID","RuleString"
"D:\New folder\test.txt","foo this is text1 bar","Rule1","this is text"
"D:\New folder\test.txt","this is text3ss","Rule1","this is text"
"D:\New folder\test2.txt","foo this is text1 bar","Rule2","this is text1"
"D:\New folder\Test\test2.txt","this is text3ss","Rule3","this is text3"

答案 1 :(得分:0)

我稍微调整了你的JSON:

{"Logs": {
   "Component":
   {
     "Files":[
       {
         "name": "test.txt",
         "encoding": "UTF8",
         "rules":["this is text"
         ]
       },
       {
         "name": "test2.txt",
         "encoding": "UTF8",
         "rules":["this is text1",
          "this is text3"
         ]
       }
     ]
   }
}}

使用这个,这是一个可能的解决方案:

$JSON = Get-Content -Raw -Path rules.json | ConvertFrom-Json

$JSON.Logs.Component.Files |
    ForEach-Object {
        $fileName = $_.Name
        $rules = $_.rules

        Get-Content $fileName -encoding $_.encoding |
            ForEach-Object {
                for($i=0;$i -lt $rules.Count;$i++)
                {
                    if($_ -like "*$($rules[$i])*")
                    {
                        [PsCustomObject]@{RuleNumber = ($i+1); 
                                          RuleString = $rules[$i];
                                          MatchingText = $_;
                                          File = $filename} | 
                            Export-Csv matches.csv -Append -NoTypeInformation
                    }
                }
            }
    }