我需要一个shell脚本/ powershell,它会计算文件中类似的字母。
输入:
this is the sample of this script.
This script counts similar letters.
输出:
t 9
h 4
i 8
s 10
e 4
a 2
...
答案 0 :(得分:2)
在PowerShell中,您可以使用Group-Object
cmdlet:
function Count-Letter {
param(
[String]$Path,
[Switch]$IncludeWhitespace,
[Switch]$CaseSensitive
)
# Read the file, convert to char array, and pipe to group-object
# Convert input string to lowercase if CaseSensitive is not specified
$CharacterGroups = if($CaseSensitive){
(Get-Content $Path -Raw).ToCharArray() | Group-Object -NoElement
} else {
(Get-Content $Path -Raw).ToLower().ToCharArray() | Group-Object -NoElement
}
# Remove any whitespace character group if IncludeWhitespace parameter is not bound
if(-not $IncludeWhitespace){
$CharacterGroups = $CharacterGroups |Where-Object { "$($_.Name)" -match "\S" }
}
# Return the groups, letters first and count second in a default format-table
$CharacterGroups |Select-Object @{Name="Letter";Expression={$_.Name}},Count
}
这是我的机器上的输出看起来像您的样本输入+换行符
答案 1 :(得分:1)
这一个班轮应该做:
with open(r'C:\Temp\f1.txt', 'rb') as f:
var = f.read()
var.decode("unicode-escape")
输出示例:
awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)if(tolower($i)~/[a-z]/)a[tolower($i)]++}
END{for(x in a)print x, a[x]}' file
答案 2 :(得分:0)
"this is the sample of this script".ToCharArray() | group -NoElement | sort Count -Descending | where Name -NE ' '
答案 3 :(得分:0)
echo "this is the sample of this script. \
This script counts similar letters." | \
grep -o '.' | sort | uniq -c | sort -rg
首先输出,排序,最常见的字母:
10 s
10
8 t
8 i
4 r
4 h
4 e
3 p
3 l
3 c
2 o
2 m
2 a
2 .
1 u
1 T
1 n
1 f
注意:不需要sed
或awk
;一个简单的grep -o '.'
完成所有繁重的工作。要不计算空格和标点符号,请将'.'
替换为'[[:alpha:]]' |
:
echo "this is the sample of this script. \
This script counts similar letters." | \
grep -o '[[:alpha:]]' | sort | uniq -c | sort -rg
要将大写和小写字母统计为一个,请使用--ignore-case
和sort
的{{1}}选项:
uniq
输出:
echo "this is the sample of this script. \
This script counts similar letters." | \
grep -o '[[:alpha:]]' | sort -i | uniq -ic | sort -rg
答案 4 :(得分:-1)
echo "this is the sample of this script" | \
sed -e 's/ //g' -e 's/\([A-z]\)/\1|/g' | tr '|' '\n' | \
sort | grep -v "^$" | uniq -c | \
awk '{printf "%s %s\n",$2,$1}'