Question

我一直在寻找一种方法来计算特定字符在字符串中彼此出现的次数。我找到的所有方法都只计算字符串“A”出现在字符串中的次数。

Example of string:
0xAAABBC0123456789AABBCCDD0123456789ABCDEF

每个字符串长度为43个字符，以“0x”开头。每个字符串仅包含随机顺序的以下字符：0-9和A-F，（总共16个不同的字符）。每个角色可以连续多次出现在一行中，例如：“AAA”或＆＃34; 111＆＃34;。

我感兴趣的是，在一个字符串中，每个16个字符中的每一个最多出现多少次，并通过我的所有字符串进行检查。

到目前为止，我只提出了这个Powershell脚本，它计算每行每个字符出现的次数：

Get-Content " C:\Temp\strings.txt" | ForEach-Object{
    New-Object PSObject -Property @{
        Strings = $_
        Row = $_.ReadCount
        9 = [regex]::matches($_,"9").count
        D = [regex]::matches($_,"D").count
        B = [regex]::matches($_,"B").count
        C = [regex]::matches($_,"C").count
        7 = [regex]::matches($_,"7").count
        3 = [regex]::matches($_,"3").count
        1 = [regex]::matches($_,"1").count
        8 = [regex]::matches($_,"8").count
        F = [regex]::matches($_,"F").count
        2 = [regex]::matches($_,"2").count
        4 = [regex]::matches($_,"4").count
        E = [regex]::matches($_,"E").count
        6 = [regex]::matches($_,"6").count
        5 = [regex]::matches($_,"5").count
        A = [regex]::matches($_,"A").count
        0 = [regex]::matches($_,"0").count
    }
} | Sort Count -Descending | Export-Csv -Path "C:\Temp\output.csv" –NoTypeInformation

我最好在Powershell中这样做，但如果还有另一种方法可以更容易地做到这一点，请告诉我。

Answer 1

一种方法是逐个字符地迭代源字符串，并跟踪字符被看到的次数。这可以通过哈希表轻松完成。像这样，

# Hashtable initialization. Add keys for 0-9A-F:
# Each char has initial count 0
$ht = @{}
"ABCDEF0123456789".ToCharArray() | % {
    $ht.Add($($_.ToString()), 0)
}

# Test data, the 0x prefix will contain one extra zero
$s = "0xAAABBC0123456789AABBCCDD0123456789ABCDEF"    

# Convert data to char array for iteration
# Increment value in hashtable by using the char as key
$s.ToCharArray() | % { $ht[$_.ToString()]+=1 }

# Check results
PS C:\> $ht

Name                           Value
----                           -----
B                              5
3                              2
5                              2
x                              1
9                              2
2                              2
8                              2
0                              3
1                              2
E                              1
7                              2
F                              1
6                              2
4                              2
D                              3
A                              6
C                              4

Answer 2

构建一个HexPair，迭代位置的字符串位置（省略最后一个），并以HexPair为键在哈希表中增加一个值。

$String = '0xAAABBC0123456789AABBCCDD0123456789ABCDEF'
$Hash=@{}
for ($i=2;$i -le ($string.length-2);$i++){
    $Hash[$($String.Substring($i,2))]+=1
}
$Hash.GetEnumerator()|ForEach-Object{
   [PSCustomObject]@{HexPair = $_.Name
                     Count = $_.Value}
} |Sort Count -Descending

示例输出

HexPair Count
------- -----
BC          3
AB          3
AA          3
CD          2
BB          2
9A          2
89          2
78          2
67          2
56          2
45          2
34          2
23          2
12          2
01          2
EF          1
DE          1
DD          1
D0          1
CC          1
C0          1

替代产品：

$Hash.GetEnumerator()|ForEach-Object{
    [PSCustomObject]@{HexPair = $_.Name
                      Count = $_.Value}
 } |Sort HexPair|group Count |%{"Count {0} {1}" -f $_.Name,($_.Group.HexPair -Join(', '))}|Sort

Count 1 C0, CC, D0, DD, DE, EF
Count 2 01, 12, 23, 34, 45, 56, 67, 78, 89, 9A, BB, CD
Count 3 AA, AB, BC

Answer 3

您可以使用lookbehind和backreference将字符串拆分为重复的组：

$s = '0xAAABBC0123456789AABBCCDD0123456789ABCDEF'
$repeats = $s.Remove(0, 2) -split '(?<=(.))(?!\1|$)'

现在我们可以根据每个字母的第一个字母对子字符串进行分组：

$groups = $repeats |Group-Object {$_[0]} -AsHashTable

最后抓住每个角色的最长序列：

'0123456789ABCDEF'.ToCharArray() |%{
    [pscustomobject]@{
        Character = "$_"
        MaxLength = "$($groups[$_] |Sort Length -Descending |Select -First 1)".Length
    }
}

你应该得到一个列表（例如你的）：

Character MaxLength
--------- ---------
0                 1
1                 1
2                 1
3                 1
4                 1
5                 1
6                 1
7                 1
8                 1
9                 1
A                 3
B                 2
C                 2
D                 2
E                 1
F                 1

Answer 4

结果以这种方式出现，即使它为每个字符串提供了15个额外的行，我也可以轻松地在Microsoft Excel中过滤掉不需要的内容。

#Removed all "0x" in textfile before running this script
$strings = Get-Content " C:\Temp\strings_without_0x.txt"
foreach($s in $strings) {
$repeats = $s.Remove(0, 2) -split '(?<=(.))(?!\1|$)'

$groups = $repeats |Group-Object {$_[0]} -AsHashTable

'0123456789ABCDEF'.ToCharArray() |%{
    [pscustomobject]@{
        String = "$s"
        Character = "$_"
        MaxLength = "$($groups[$_] |Sort Length -Descending |Select -First 1)".Length
    }

} | Sort Count -Descending | Export-Csv -Path "C:\Temp\output.csv" -NoTypeInformation -Append}

感谢您的所有好评！

Answer 5

尝试一下。

$out=@()
$string="0xAAABBC0123456789AABBCCDD0123456789ABCDEF"
$out+="Character,Count"
$out+='0123456789ABCDEF'.ToCharArray()|%{"$_," + ($string.split("$_")|Where-object{$_ -eq ""}).count}
ConvertFrom-Csv $out |sort count -Descending

这将产生以下内容：

 Character Count
 --------- -----
 A         3    
 B         2    
 0         1    
 C         1    
 D         1    
 F         1    
 1         0    
 2         0    
 3         0    
 4         0    
 5         0    
 6         0    
 7         0    
 8         0    
 9         0    
 E         0

您可以将其放入这样的函数中：

function count_dups ($string){
   $out=@() # null array
   $out+="Character,Count" # header
   $out+='0123456789ABCDEF'.ToCharArray()|%{"$_," + ($string.split("$_")|Where-object{$_ -eq ""}).count}
   return ConvertFrom-Csv $out | sort count -Descending
}

我在这里所做的最大部分就是这行。

'0123456789ABCDEF'.ToCharArray()|%{"$_," + (string.split("$_")|Where-object{$_ -eq ""}).count}

我将字符串拆分为从字符数组'0123456789ABCDEF'输入的字符上的数组。然后我要计算数组中的空元素。

我仅创建数组$ out，以便可以像您的示例一样格式化输出。

特定字符在字符串中彼此出现的次数

5 个答案: