我有以下功能
set.seed(1984)
test <- function(paths){
x <- matrix(rep(NA, paths*3), ncol = 3,
dimnames = list(c(), c("Cookie", "Site", "Count")))
for(i in 1:paths){
x[i, 1] <- round(sqrt(rnorm(1,50,100)^2))
n <- function(){sample(1:10, size = 1)}
draws <- function(){sample(LETTERS[1:5], n(), replace = T)}
x[i, 2] <- paste(draws(), collapse = '-')
}
return(x)
}
产生类似
的输出Cookie Site Count
[1,] "91" "B-D-E-A" NA
[2,] "37" "E-A-D" NA
[3,] "108" "B" NA
[4,] "93" "D-A-D" NA
[5,] "157" "E-C" NA
[6,] "52" "B-C-D-A-C-C-B-A-B-E" NA
对于我想要的Cookie
列中的每个唯一Cookie ID
Site
字符串连接在一起(Cookie
包含重复值)Count
ID的Cookie
值(因此,可能会重复)有什么想法吗?
答案 0 :(得分:1)
这会将您的矩阵按Cookie
分组,并返回Site
列中的总字符数(等于连词的长度。
test.df <- test(91)
library(dplyr)
test.df %>%
as.data.frame(., stringsAsFactors = FALSE) %>%
group_by(Cookie) %>%
mutate(Count = sum(nchar(Site)))
如果您希望Count
排除字符-
,只需将Site
替换为gsub("-", "", Site, fixed = TRUE)
。
答案 1 :(得分:1)
使用 public Models.BadgeVoleModel DeserializeData(string data)
{
var result = Newtonsoft.Json.JsonConvert
.DeserializeObject<Models.BadgeVoleModel>(data);
return result;
}
,我们可以
data.table
如果我们需要完整破折号
library(data.table)
dt <- as.data.table(test(91))[, Count := as.character(sum(nchar(gsub("-", "", Site)))) ,
by = Cookie][]
dt[, Full_path := gsub("-", ", ", toString(Site)), by = Cookie]
head(dt)
# Cookie Site Count Full_path
#1: 258 A 1 A
#2: 26 D-D-E-E-C 10 D, D, E, E, C, E, E, A, C, A
#3: 43 D-D-A 3 D, D, A
#4: 171 C-C-E-A-B-D-E 7 C, C, E, A, B, D, E
#5: 57 A-D-D-C 4 A, D, D, C
#6: 156 A-D 2 A, D