Question

目前，我有一个字符串"abdicator"。我想找出这个单词的字母频率与所有英文字母（即26个字母）的比较，输出格式如下。

输出：

a b c d e f g h i ... o ... r s t ... x y z
2 1 1 0 0 0 0 0 1..0..1..0..1 0 1 ... 0 ...

此输出可以是数字向量（名称为26个字母）。我最初的尝试是首先使用strsplit函数将字符串拆分为单个字母（使用R）：

strsplit("abdicator","") #split at every character
#[[1]]
#[1] "a" "b" "c" "d" "e"`

但是，对于下一步该怎么做，我有点困惑。请有人赐教我吗？非常感谢。

Answer 1

在R：

table(c(letters, strsplit("abdicator", "")[[1]]))-1
# a b c d e f g h i j k l m n o p q r s t u v w x y z 
# 2 1 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0

并扩展一点以处理多个单词和/或大写字母的可能性：

words <- c("abdicator", "Syzygy")
letterCount <- function(X) table(c(letters, strsplit(tolower(X), "")[[1]]))-1
t(sapply(words,  letterCount))
#           a b c d e f g h i j k l m n o p q r s t u v w x y z
# abdicator 2 1 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
# syzygy    0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 3 1

Answer 2

在Python中：

>>> from collections import Counter
>>> s = "abdicator"
>>> Counter(s)
Counter({'a': 2, 'c': 1, 'b': 1, 'd': 1, 'i': 1, 'o': 1, 'r': 1, 't': 1})
>>> map(Counter(s).__getitem__, map(chr, range(ord('a'), ord('z')+1)))
[2, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0]

或者：

>>> import string
>>> map(Counter(s).__getitem__, string.lowercase)
[2, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0]

Answer 3

的Python：

import collections
import string

counts = collections.Counter('abdicator')
chars = string.ascii_lowercase
print(*chars, sep=' ')
print(*[counts[char] for char in chars], sep=' ')

Answer 4

在Python 2中：

import string, collections
ctr = collections.Counter('abdicator')
for l in string.ascii_lowercase:
    print l,
print
for l in string.ascii_lowercase:
    print ctr[l],
print

在Python 3中，只有print的语法发生了变化。

这会产生您请求的输出。核心思想是collections.Counter，用缺失的密钥索引，谦卑地返回0，带有明显的语义＆＃34;这个密钥已被看到0次＆＃34;完全符合它用于存在的密钥的语义（它返回它们的计数，即它们被看到的次数）。

将单词中的字母频率与R（或python）中的26个字母匹配

4 个答案: