我有一个包含unigrams及其频率的数据框。我想以有效的方式按字母顺序分割单词的向量。我知道可以使用grep
uniPhrase<-unigramDF$phrase[order(unigramDF$phrase)]
as<-uniPhrase[grep(pattern = "^a", x = uniPhrase)]
bs<-uniPhrase[grep(pattern = "^b", x = uniPhrase)]
zs<-uniPhrase[grep(pattern = "^z", x = uniPhrase)]
但有没有办法用sapply来做?
答案 0 :(得分:1)
尝试
lst <- setNames(lapply(paste0("^", letters),
function(x) uniPhrase[grep(x, uniPhrase)]), paste0(letters, 's'))
或者
lst2 <- split(uniPhrase, substr(uniPhrase, 1, 1))
names(lst2) <- names(lst)
identical(lst2, lst)
#[1] TRUE
set.seed(48)
uniPhrase <- sample(paste0(letters, rep(paste0('word', 1:10), each=26)),
100, replace=TRUE)