我正在寻找一种方法来找到一种方法,从包含特定字母的单词列表中创建单词子集。
现在我知道我可以使用grepexpr函数来查找单词中是否存在字母,但是我无法创建包含特定字母的单词子集。
我已经能够在单词列表中找到字母总数:
> letters_table2<-table(unlist(strsplit(newdata2, ""), use.names=FALSE))
> letters_table2
a b c d e f g h i j k l m n o p q r s t u v w x y z
14 9 11 8 11 6 4 7 12 3 3 9 14 7 9 8 6 13 13 6 7 8 4 7 8 3
我想从newdata2创建一个只包含a,b,c等的单词列表。
newdata2
[1] "ae" "aj" "al" "an" "av" "av" "ay" "ba" "bd" "bd" "bk" "bl" "bv" "ca" "cl" "cm" "co"
[18] "cr" "cy" "dh" "dl" "dm" "ea" "ec" "ef" "er" "ex" "ex" "ez" "fm" "fo" "ft" "gi" "gy"
[35] "hb" "hm" "hr" "hr" "hs" "id" "in" "io" "iq" "ir" "ir" "it" "iz" "ja" "js" "kn" "lc"
[52] "ld" "le" "lp" "ls" "me" "mg" "mh" "mi" "mi" "mm" "mo" "ms" "nf" "nw" "ny" "ok" "op"
[69] "ox" "pa" "pi" "pr" "ps" "ps" "py" "qc" "qf" "qm" "qu" "qy" "rn" "rr" "rs" "rt" "ru"
[86] "sa" "so" "ss" "ts" "uc" "us" "uu" "ux" "vb" "vc" "vv" "vw" "wb" "wg" "xe" "xo" "xt"
[103] "yd" "yt" "za"
答案 0 :(得分:1)
我建议:
setNames(lapply(letters, function(y) grep(y, x, value = TRUE)), letters)
这是一个简单的例子,只使用5个字母而不是全部26个。
set.seed(1)
mydata <- paste0(sample(letters[1:5], 15, TRUE),
sample(letters[1:5], 15, TRUE))
table(unlist(strsplit(mydata, ""), use.names = FALSE))
##
## a b c d e
## 4 11 2 7 6
setNames(lapply(letters[1:5], function(y) {
grep(y, mydata, value = TRUE)
}), letters[1:5])
## $a
## [1] "da" "ab" "aa"
##
## $b
## [1] "bc" "bd" "eb" "bd" "eb" "ab" "bb" "db" "be" "db"
##
## $c
## [1] "bc" "ce"
##
## $d
## [1] "bd" "bd" "dd" "da" "db" "db"
##
## $e
## [1] "ce" "eb" "ee" "eb" "be"
##