我们的用户有时会给我们拼写错误的名字/用户名,我希望能够搜索活动目录中的近似匹配,按最近排序(任何算法都可以)。 例如,如果我尝试
Get-Aduser -Filter {GivenName -like "Jack"}
我可以找到用户杰克,但如果我使用" Jacck"或" ack"
有一种简单的方法吗?
答案 0 :(得分:3)
您可以计算两个字符串之间的Levenshtein distance,并确保它低于某个阈值(可能是1或2)。这里有一个PowerShell示例: Levenshtein distance in powershell
示例:
答案 1 :(得分:1)
有趣的问题和答案。但是一个可能更简单的解决方案是搜索多个属性,因为我希望大多数人能正确拼写其中一个名称:)
Get-ADUser -Filter {GivenName -like "FirstName" -or SurName -Like "SecondName"}
答案 2 :(得分:0)
Soundex算法就是针对这种情况而设计的。以下是一些可能有用的PowerShell代码:
答案 3 :(得分:0)
好的,根据我得到的好答案(感谢@boxdog和@Palle Due),我发布了一个更完整的答案。
主要来源:https://github.com/gravejester/Communary.PASM - PowerShell近似字符串匹配。这个主题的伟大模块。
来源:https://github.com/gravejester/Communary.PASM/tree/master/Functions
# download functions to the temp folder
$urls =
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-CommonPrefix.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LevenshteinDistance.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LongestCommonSubstring.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-FuzzyMatchScore.ps1"
$paths = $urls | %{$_.split("\/")|select -last 1| %{"$env:TEMP\$_"}}
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
for($i=0;$i -lt $urls.count;$i++){
Invoke-WebRequest -Uri $urls[$i] -OutFile $paths[$i]
}
# concatenating the functions so we don't have to deal with source permissions
foreach($path in $paths){
cat $path | Add-Content "$env:TEMP\Fuzzy_score_functions.ps1"
}
# to save for later, open the temp folder with: Invoke-Item $env:TEMP
# then copy "Fuzzy_score_functions.ps1" somewhere else
# source Fuzzy_score_functions.ps1
. "$env:TEMP\Fuzzy_score_functions.ps1"
简单测试:
Get-FuzzyMatchScore "a" "abc" # 98
创建评分功能:
## start function
function get_score{
param($searchQuery,$searchData,$nlist,[switch]$levd)
if($nlist -eq $null){$nlist = 10}
$scores = foreach($string in $searchData){
Try{
if($levd){
$score = Get-LevenshteinDistance $searchQuery $string }
else{
$score = Get-FuzzyMatchScore -Search $searchQuery -String $string }
Write-Output (,([PSCustomObject][Ordered] @{
Score = $score
Result = $string
}))
$I = $searchData.indexof($string)/$searchData.count*100
$I = [math]::Round($I)
Write-Progress -Activity "Search in Progress" -Status "$I% Complete:" -PercentComplete $I
}Catch{Continue}
}
if($levd) { $scores | Sort-Object Score,Result |select -First $nlist }
else {$scores | Sort-Object Score,Result -Descending |select -First $nlist }
} ## end function
实施例
get_score "Karolin" @("Kathrin","Jane","John","Cameron")
# check the difference between Fuzzy and LevenshteinDistance mode
$names = "Ferris","Cameron","Sloane","Jeanie","Edward","Tom","Katie","Grace"
"Fuzzy"; get_score "Cam" $names
"Levenshtein"; get_score "Cam" $names -levd
测试大数据集的性能
## donload baby-names
$url = "https://github.com/hadley/data-baby-names/raw/master/baby-names.csv"
$output = "$env:TEMP\baby-names.csv"
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Invoke-WebRequest -Uri $url -OutFile $output
$babynames = import-csv "$env:TEMP\baby-names.csv"
$babynames.count # 258000 lines
$babynames[0..3] # year, name, percent, sex
$searchdata = $babynames.name[0..499]
$query = "Waren" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd
$query = "Jon" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd
$query = "Howie" # lookalike
"Fuzzy"; get_score $query $searchdata;
"Levenshtein"; get_score $query $searchdata -levd
测试
$query = "John"
$res = for($i=1;$i -le 10;$i++){
$searchdata = $babynames.name[0..($i*100-1)]
$meas = measure-command{$res = get_score $query $searchdata}
write-host $i
Write-Output (,([PSCustomObject][Ordered] @{
N = $i*100
MS = $meas.Milliseconds
MS_per_line = [math]::Round($meas.Milliseconds/$searchdata.Count,2)
}))
}
$res
+------+-----+-------------+
| N | MS | MS_per_line |
| - | -- | ----------- |
| 100 | 696 | 6.96 |
| 200 | 544 | 2.72 |
| 300 | 336 | 1.12 |
| 400 | 6 | 0.02 |
| 500 | 718 | 1.44 |
| 600 | 452 | 0.75 |
| 700 | 224 | 0.32 |
| 800 | 912 | 1.14 |
| 900 | 718 | 0.8 |
| 1000 | 417 | 0.42 |
+------+-----+-------------+
这些时间非常疯狂,如果有人理解为什么请评论它。
这样做的最佳方式取决于AD的组织。这里我们有很多OU,但普通用户将在Users和DisabledUsers中。此外,域和DC也会有所不同(我将此处更改为<domain>
和<DC>
)。
# One way to get a List of OUs
Get-ADOrganizationalUnit -Filter * -Properties CanonicalName |
Select-Object -Property CanonicalName
然后您可以使用Where-Object -FilterScript {}
来过滤每个OU
# example, saving on the temp folder
Get-ADUser -f * |
Where-Object -FilterScript {
($_.DistinguishedName -match "CN=\w*,OU=DisabledUsers,DC=<domain>,DC=<DC>" -or
$_.DistinguishedName -match "CN=\w*,OU=Users,DC=<domain>,DC=<DC>") -and
$_.GivenName -ne $null #remove users without givenname, like test users
} |
select @{n="Fullname";e={$_.GivenName+" "+$_.Surname}},
GivenName,Surname,SamAccountName |
Export-CSV -Path "$env:TEMP\all_Users.csv" -NoTypeInformation
# you can open the file to inspect
Invoke-Item "$env:TEMP\all_Users.csv"
# import
$allusers = Import-Csv "$env:TEMP\all_Users.csv"
$allusers.Count # number of lines
用法:
get_score "Jane Done" $allusers.fullname 15 # return the 15 first
get_score "jdoe" $allusers.samaccountname 15