字符计数和转租特定模式

时间:2013-11-04 21:59:57

标签: r

我有一个data.frames(d)列表,如下所示:

$ 1  :'data.frame':   1 obs. of  2 variables:     
..$ index: int 2
     

.. $ V1:因子w / 125级别“cgtsloqasmlkjybjlo,..:”

  $ 2  :'data.frame': 1 obs. of  2 variables:  
..$ index: int 2
 ..$ V1   : Factor w/ 125 levels "ponlohlofdctlo,..:"    

等1000个data.frames。我必须计算“cgtsloqasmlkjybjlo,..:”以及“ponlohlofdctlo,..:”以及其他1000个data.frames中出现的唯一字母数。 我尝试了一个愚蠢的功能,但我不是专家所以它也是错的,因为它不起作用:

无论如何我试图分裂(但它不起作用..):

 chars = sapply(d, function(x) strsplit(as.character(d),"")) 

另外,我必须计算“cgtsloqasmlkjybjlo,..:”以及“ponlohlofdctlo,..:”和其他1000个中“lo”的出现次数。

编辑:所需的输出将是data.frame:

        Seq           length(unique_letters)   lo_occurrences
 cgtsloqasmlkjybjlo           13                       2      
   ponlohlofdctlo             9                        3     
   ..............           ............         ............    


 dput output: 
  dput(d[1:3])
     

结构(列表(1 =结构(1000L,.Label = c(“jhgfilsouilohgucaksfiaaknajdauloadbayrzjdhad”,“fjkhqurtglowqgbdahhmolovdethabvfdalo”,“......”,“V1”),class =“factor”)), .Names = c(“1”,“2”,“3”))

1 个答案:

答案 0 :(得分:1)

方法是:

#simulating your list; I got an error trying to use your dput
d <- list(data.frame(index = 2, V1 = "cgtsloqasmlkjybjlo"), 
      data.frame(index = 2, V1 = "ponlohlofdctlo"))
d
#[[1]]
#  index                 V1
#1     2 cgtsloqasmlkjybjlo

#[[2]]
#  index             V1
#1     2 ponlohlofdctlo

res <- do.call(rbind, lapply(d, function(x) data.frame(seq = as.character(x$V1), 
       length_uniques = length(unique(unlist(strsplit(as.character(x$V1), "")))), 
               lo_counts = length(unlist(gregexpr("lo", as.character(x$V1)))))))
res
#                 seq length_uniques lo_counts
#1 cgtsloqasmlkjybjlo             13         2
#2     ponlohlofdctlo              9         3