在文本中找到最高频率的字母

时间:2014-05-21 06:43:15

标签: r

我必须在表格布局的文本中找到最多出现的字母。

我得到了这个帮助我,但我不确定如何使用它。

letter.count <- function(text, letters) {
## count the number of times letters appears in the text
return(sum(unlist(strsplit(text, "")) %in% letters))
}

数据是一组推文,我必须找到样本的平均频率。 设置数据表使得推文位于一侧而负位于另一侧。我设法隔离了所有推文的负面推文,现在我只需找到所有推文中最常见的字母。

1 个答案:

答案 0 :(得分:1)

您可以使用table

执行此类操作
locate.letters <- function(text, letters){
    x <- unlist(strsplit(text, ''))
    tt <- table(x[x %in% letters])
    list(table = tt, sum = sum(tt), max = tt[which.max(tt)])
}

> txt
## [1] "I am a geography student. I am interested in mining tweets for 
## geographic data in support of my thesis on the new Geography. I know maps
## are being developed by some developers. I would like to be able to 
## develop maps myself. How do I do that? What is the process? 
## Thanks in advance."

> locate.letters(txt, letters[1:10])
## $table

##  a  b  c  d  e  f  g  h  i 
## 17  4  3 11 30  3  7  9 11 

## $sum
## [1] 95

## $max
##  e 
## 30