在shell脚本中计算文件中的字母

时间:2015-05-21 12:02:23

标签: shell powershell scripting

我需要一个shell脚本/ powershell,它会计算文件中类似的字母。

输入:

this is the sample of this script.
This script counts similar letters.

输出:

t 9
h 4
i 8
s 10
e 4
a 2
...

5 个答案:

答案 0 :(得分:2)

在PowerShell中,您可以使用Group-Object cmdlet:

执行此操作
function Count-Letter {
    param(
        [String]$Path,
        [Switch]$IncludeWhitespace,
        [Switch]$CaseSensitive
    )

    # Read the file, convert to char array, and pipe to group-object
    # Convert input string to lowercase if CaseSensitive is not specified
    $CharacterGroups = if($CaseSensitive){
        (Get-Content $Path -Raw).ToCharArray() | Group-Object -NoElement
    } else {
        (Get-Content $Path -Raw).ToLower().ToCharArray() | Group-Object -NoElement
    }

    # Remove any whitespace character group if IncludeWhitespace parameter is not bound
    if(-not $IncludeWhitespace){
        $CharacterGroups = $CharacterGroups |Where-Object { "$($_.Name)" -match "\S" }
    }

    # Return the groups, letters first and count second in a default format-table
    $CharacterGroups |Select-Object @{Name="Letter";Expression={$_.Name}},Count
}

这是我的机器上的输出看起来像您的样本输入+换行符 Count-Letter

答案 1 :(得分:1)

这一个班轮应该做:

with open(r'C:\Temp\f1.txt', 'rb') as f:
    var = f.read()

var.decode("unicode-escape")

输出示例:

awk  'BEGIN{FS=""}{for(i=1;i<=NF;i++)if(tolower($i)~/[a-z]/)a[tolower($i)]++}
      END{for(x in a)print x, a[x]}' file

答案 2 :(得分:0)

powershell one liner:

"this is the sample of this script".ToCharArray() | group -NoElement | sort Count -Descending | where Name -NE ' '

答案 3 :(得分:0)

echo "this is the sample of this script. \
This script counts similar letters." | \
    grep -o '.' | sort | uniq -c | sort -rg

首先输出,排序,最常见的字母:

 10 s
 10  
  8 t
  8 i
  4 r
  4 h
  4 e
  3 p
  3 l
  3 c
  2 o
  2 m
  2 a
  2 .
  1 u
  1 T
  1 n
  1 f

注意:不需要sedawk;一个简单的grep -o '.'完成所有繁重的工作。要计算空格和标点符号,请将'.'替换为'[[:alpha:]]' |

echo "this is the sample of this script. \
This script counts similar letters." | \
    grep -o '[[:alpha:]]' | sort | uniq -c | sort -rg

要将大写和小写字母统计为一个,请使用--ignore-casesort的{​​{1}}选项:

uniq

输出:

echo "this is the sample of this script. \
This script counts similar letters." | \
    grep -o '[[:alpha:]]' | sort -i | uniq -ic | sort -rg

答案 4 :(得分:-1)

echo "this is the sample of this script"  | \
sed -e 's/ //g' -e 's/\([A-z]\)/\1|/g'  |  tr '|' '\n'  |  \
sort  |  grep -v "^$"  |  uniq -c  |  \
awk '{printf "%s %s\n",$2,$1}'