我有一个data.frames(d)列表,如下所示:
$ 1 :'data.frame': 1 obs. of 2 variables: ..$ index: int 2
.. $ V1:因子w / 125级别“cgtsloqasmlkjybjlo,..:”
$ 2 :'data.frame': 1 obs. of 2 variables: ..$ index: int 2 ..$ V1 : Factor w/ 125 levels "ponlohlofdctlo,..:"
等1000个data.frames。我必须计算“cgtsloqasmlkjybjlo,..:”以及“ponlohlofdctlo,..:”以及其他1000个data.frames中出现的唯一字母数。 我尝试了一个愚蠢的功能,但我不是专家所以它也是错的,因为它不起作用:
无论如何我试图分裂(但它不起作用..):
chars = sapply(d, function(x) strsplit(as.character(d),""))
另外,我必须计算“cgtsloqasmlkjybjlo,..:”以及“ponlohlofdctlo,..:”和其他1000个中“lo”的出现次数。
编辑:所需的输出将是data.frame:
Seq length(unique_letters) lo_occurrences cgtsloqasmlkjybjlo 13 2 ponlohlofdctlo 9 3 .............. ............ ............ dput output: dput(d[1:3])
结构(列表(
1
=结构(1000L,.Label = c(“jhgfilsouilohgucaksfiaaknajdauloadbayrzjdhad”,“fjkhqurtglowqgbdahhmolovdethabvfdalo”,“......”,“V1”),class =“factor”)), .Names = c(“1”,“2”,“3”))
答案 0 :(得分:1)
方法是:
#simulating your list; I got an error trying to use your dput
d <- list(data.frame(index = 2, V1 = "cgtsloqasmlkjybjlo"),
data.frame(index = 2, V1 = "ponlohlofdctlo"))
d
#[[1]]
# index V1
#1 2 cgtsloqasmlkjybjlo
#[[2]]
# index V1
#1 2 ponlohlofdctlo
res <- do.call(rbind, lapply(d, function(x) data.frame(seq = as.character(x$V1),
length_uniques = length(unique(unlist(strsplit(as.character(x$V1), "")))),
lo_counts = length(unlist(gregexpr("lo", as.character(x$V1)))))))
res
# seq length_uniques lo_counts
#1 cgtsloqasmlkjybjlo 13 2
#2 ponlohlofdctlo 9 3