我有一组.txt文件,其中包含以下一个或两个字符串。
"red", "blue", "green", "orange", "purple", ....
列表中有更多(50+)种可能性。
如果有帮助,我可以判断.txt文件是否包含一个或两个项目,但不知道它们是哪一个。字符串模式总是在它们自己的行上。
我希望脚本能够具体告诉我它找到哪一个或两个字符串匹配(来自主列表),以及它找到它们的顺序。 (哪一个是第一个)
由于我要搜索大量文本文件,因此我想在搜索时将输出结果写入CSV文件。
FILENAME1,first_match,second_match
file1.txt,blue,red
file2.txt,red, blue
file3.txt,orange,
file4.txt,purple,red
file5.txt,purple,
...
我尝试使用许多单独的Select-Strings
返回布尔结果来设置找到任何匹配项的变量,但是随着可能的字符串数量,它变得非常快。我对这个问题的搜索结果没有给我提供新的想法。 (我确定我没有以正确的方式提问)
我是否需要遍历每个文件中的每一行文字?
我是否通过检查每个搜索字符串的存在来坚持消除方法的过程?
我正在寻找一种更优雅的方法解决这个问题。 (如果存在的话)
答案 0 :(得分:3)
不是很直观但很优雅......
$regex = "(purple|blue|red)"
Get-ChildItem $env:TEMP\test\*.txt | Foreach-Object{
$result = $_.FullName
switch -Regex -File $_
{
$regex {$result = "$($result),$($matches[1])"}
}
$result
}
C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file1.txt,blue,red
C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file2.txt,red,blue
file1
首先包含blue
,然后是red
file2
首先包含red
,然后是blue
答案 1 :(得分:1)
你可以使用正则表达式进行搜索以获得索引(startpos。在线)与Select-String
结合使用,返回亚麻,你就可以了。
Select-String
支持将数组作为-Pattern
的值,但遗憾的是,即使您使用-AllMatches
(bug?),它也会在第一次匹配后停在一行上。因此,我们必须每个单词/模式搜索一次。尝试:
#List of words. Had to escape them because Select-String doesn't return Matches-objects (with Index/location) for SimpleMatch
$words = "purple","blue","red" | ForEach-Object { [regex]::Escape($_) }
#Can also use a list with word/sentence per line using $words = Get-Content patterns.txt | % { [regex]::Escape($_.Trim()) }
#Get all files to search
Get-ChildItem -Filter "test.txt" -Recurse | Foreach-Object {
#Has to loop words because Select-String -Pattern "blue","red" won't return match for both pattern. It stops on a line after first match
foreach ($word in $words) {
$_ | Select-String -Pattern $word |
#Select the properties we care about
Select-Object Path, Line, Pattern, LineNumber, @{n="Index";e={$_.Matches[0].Index}}
}
} |
#Sort by File (to keep file-matches together), then LineNumber and Index to get the order of matches
Sort-Object Path, LineNumber, Index |
Export-Csv -NoTypeInformation -Path Results.csv -Encoding UTF8
Results.csv
"Path","Line","Pattern","LineNumber","Index"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","blue","3","10"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","red","3","15"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","red","4","10"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","blue","4","15"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","purple","6","10"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","red","6","17"
"C:\Users\frode\Downloads\test.txt","file5.txt,purple,","purple","7","10"