针对每个用户ID映射的完整视频和迷你视频的计数

时间:2017-06-30 12:56:40

标签: r

Data:

大家好,

我的数据大小接近4GB,列数为" UserID,MediaID,Full / Mini。我想知道每个用户观看了多少完整和迷你剧集。基本上每行都有每个用户观看的Full和Mini Epis号。另外,让我知道如何更有效地做到这一点,因为数据量巨大,并且会减慢处理速度。

感谢。任何帮助将受到高度赞赏。

dat=data.frame(id=c("a","a","a","b","c"), media_id=c("1a","1b","1c","2b","2c"), Full_mini=c("ful","ful","mini","mini","full")) id=c("a","a","a","b","c") 

1 个答案:

答案 0 :(得分:0)

你可以用桌子来做。

subset(as.data.frame(table(dat[-2])),Freq>0)
#   id Full_mini Freq
# 1  a       ful    2
# 6  c      full    1
# 7  a      mini    1
# 8  b      mini    1

错别字是你的!

如果它仍然太慢,请将其拆分为2,这对您的数据集来说是一个幸运的事情,即您最后一个col只有两个可能的值。那么你将有2个较小的数据集,你只需要对一个col进行计数,这应该很快。

dat_full <- subset(dat,Full_mini == "full" | Full_mini == "ful") 
dat_mini <- subset(dat,Full_mini == "mini")
library(magrittr)
res_full <- dat_full$id %>% 
  table %>% 
  as.data.frame %>%
  subset(Freq>0) %>%
  transform(Full_mini = "full") %>%
  setNames(c("id","Freq","Full_mini"))

res_mini <- dat_mini$id %>% 
  table %>% 
  as.data.frame %>%
  subset(Freq>0) %>%
  transform(Full_mini = "mini") %>%
  setNames(c("id","Freq","Full_mini"))

res <- rbind(res_full,res_mini)

或并排:

res_full <- dat_full$id %>% 
  table %>% 
  as.data.frame

res_mini <- dat_mini$id %>% 
  table %>% 
  as.data.frame

res <- setNames(cbind(res_full[1:2],res_mini[2]),c("id","full","mini"))

  id full Freq
1  a    2    1
2  b    0    1
3  c    1    0