如何根据另一列中的ID计算一列中连接字符串的长度?

时间:2016-04-23 00:22:32

标签: r matrix concatenation

我有以下功能

set.seed(1984)
test <- function(paths){
  x <- matrix(rep(NA, paths*3), ncol = 3, 
              dimnames = list(c(), c("Cookie", "Site", "Count")))
  for(i in 1:paths){
    x[i, 1] <- round(sqrt(rnorm(1,50,100)^2))
    n <- function(){sample(1:10, size = 1)}
    draws <- function(){sample(LETTERS[1:5], n(), replace = T)}
    x[i, 2] <- paste(draws(), collapse = '-')
    }
  return(x)
}

产生类似

的输出
Cookie      Site                  Count
[1,] "91"   "B-D-E-A"             NA   
[2,] "37"   "E-A-D"               NA   
[3,] "108"  "B"                   NA   
[4,] "93"   "D-A-D"               NA   
[5,] "157"  "E-C"                 NA   
[6,] "52"   "B-C-D-A-C-C-B-A-B-E" NA

对于我想要的Cookie列中的每个唯一Cookie ID

  1. 将每个Site字符串连接在一起(Cookie包含重复值)
  2. 获得连接的长度
  3. 将该TOTAL长度删除为该Count ID的Cookie值(因此,可能会重复)
  4. 有什么想法吗?

2 个答案:

答案 0 :(得分:1)

这会将您的矩阵按Cookie分组,并返回Site列中的总字符数(等于连词的长度。

test.df <- test(91)
library(dplyr)
test.df %>% 
  as.data.frame(., stringsAsFactors = FALSE) %>% 
  group_by(Cookie) %>% 
  mutate(Count = sum(nchar(Site)))

如果您希望Count排除字符-,只需将Site替换为gsub("-", "", Site, fixed = TRUE)

答案 1 :(得分:1)

使用 public Models.BadgeVoleModel DeserializeData(string data) { var result = Newtonsoft.Json.JsonConvert .DeserializeObject<Models.BadgeVoleModel>(data); return result; } ,我们可以

data.table

如果我们需要完整破折号

library(data.table)
dt <- as.data.table(test(91))[, Count := as.character(sum(nchar(gsub("-", "", Site)))) , 
                    by = Cookie][]

dt[, Full_path := gsub("-", ", ", toString(Site)), by = Cookie]
head(dt)
#   Cookie          Site Count                    Full_path
#1:    258             A     1                            A
#2:     26     D-D-E-E-C    10 D, D, E, E, C, E, E, A, C, A
#3:     43         D-D-A     3                      D, D, A
#4:    171 C-C-E-A-B-D-E     7          C, C, E, A, B, D, E
#5:     57       A-D-D-C     4                   A, D, D, C
#6:    156           A-D     2                         A, D