Question

我需要一个shell脚本/ powershell，它会计算文件中类似的字母。

输入：

this is the sample of this script.
This script counts similar letters.

输出：

t 9
h 4
i 8
s 10
e 4
a 2
...

Answer 1

在PowerShell中，您可以使用Group-Object cmdlet：

执行此操作

function Count-Letter {
    param(
        [String]$Path,
        [Switch]$IncludeWhitespace,
        [Switch]$CaseSensitive
    )

    # Read the file, convert to char array, and pipe to group-object
    # Convert input string to lowercase if CaseSensitive is not specified
    $CharacterGroups = if($CaseSensitive){
        (Get-Content $Path -Raw).ToCharArray() | Group-Object -NoElement
    } else {
        (Get-Content $Path -Raw).ToLower().ToCharArray() | Group-Object -NoElement
    }

    # Remove any whitespace character group if IncludeWhitespace parameter is not bound
    if(-not $IncludeWhitespace){
        $CharacterGroups = $CharacterGroups |Where-Object { "$($_.Name)" -match "\S" }
    }

    # Return the groups, letters first and count second in a default format-table
    $CharacterGroups |Select-Object @{Name="Letter";Expression={$_.Name}},Count
}

这是我的机器上的输出看起来像您的样本输入+换行符 Count-Letter

Answer 2

这一个班轮应该做：

with open(r'C:\Temp\f1.txt', 'rb') as f:
    var = f.read()

var.decode("unicode-escape")

输出示例：

awk  'BEGIN{FS=""}{for(i=1;i<=NF;i++)if(tolower($i)~/[a-z]/)a[tolower($i)]++}
      END{for(x in a)print x, a[x]}' file

Answer 3

powershell one liner：

"this is the sample of this script".ToCharArray() | group -NoElement | sort Count -Descending | where Name -NE ' '

Answer 4

echo "this is the sample of this script. \
This script counts similar letters." | \
    grep -o '.' | sort | uniq -c | sort -rg

首先输出，排序，最常见的字母：

注意：不需要sed或awk;一个简单的grep -o '.'完成所有繁重的工作。要不计算空格和标点符号，请将'.'替换为'[[:alpha:]]' |：

echo "this is the sample of this script. \
This script counts similar letters." | \
    grep -o '[[:alpha:]]' | sort | uniq -c | sort -rg

要将大写和小写字母统计为一个，请使用--ignore-case和sort的{{1}}选项：

uniq

输出：

echo "this is the sample of this script. \
This script counts similar letters." | \
    grep -o '[[:alpha:]]' | sort -i | uniq -ic | sort -rg

Answer 5

echo "this is the sample of this script"  | \
sed -e 's/ //g' -e 's/\([A-z]\)/\1|/g'  |  tr '|' '\n'  |  \
sort  |  grep -v "^$"  |  uniq -c  |  \
awk '{printf "%s %s\n",$2,$1}'

在shell脚本中计算文件中的字母

5 个答案: