在R

时间:2017-07-24 06:47:44

标签: r

如果他们有x组,我正试图对变量一或二进行滚动计数。

基本上我希望在此示例中返回new_var1new_var2,其中每次Var1 Var2都有a组合f 1}}和小组b重要,或f和小组a等等。因此,无论aVar1Var2中是否显示a,每个组中Var1的整体外观都会计算在内。但是,必须将计数分配给正确的列。因此,如果new_var1列中显示a,则必须将实际计数分配给列Var2。因此,对于new_var2中的x <- expand.grid(letters[1:5],letters[1:5],KEEP.OUT.ATTRS = FALSE) x <- x[x[,1]!=x[,2],c(2,1)] x <- data.frame(x,group=as.character(rep(letters[c(1,2,1,4,1)+5],each=4))) x<- data.frame(x,new_var1 = c(1,2,3,4,1,2,3,4,2,3,4,5,1,2,3,4,3,4,5,6)) x<- data.frame(x,new_var2 = c(1,1,1,1,1,1,1,1,5,2,2,2,1,1,1,1,6,3,6,3))` Var2 Var1 group new_var2 new_var1 a b f 1 1 a c f 2 1 a d f 3 1 a e f 4 1 b a g 1 1 b c g 2 1 b d g 3 1 b e g 4 1 c a f 2 5 c b f 3 2 c d f 4 2 c e f 5 2 d a i 1 1 d b i 2 1 d c i 3 1 d e i 4 1 e a f 3 6 e b f 4 3 e c f 5 6 e d f 6 3 ,实际计数应为x <- data.table(x) x[, new_var1a := seq(.N) , by = c('Var1','group')] x[, new_var2a := seq(.N) , by = c('Var2','group')] Var2 Var1 group new_var2 new_var1 new_var1a new_var2a 1: a b f 1 1 1 1 2: a c f 2 1 1 2 3: a d f 3 1 1 3 4: a e f 4 1 1 4 5: b a g 1 1 1 1 6: b c g 2 1 1 2 7: b d g 3 1 1 3 8: b e g 4 1 1 4 9: c a f 2 5 1 1 10: c b f 3 2 2 2 11: c d f 4 2 2 3 12: c e f 5 2 2 4 13: d a i 1 1 1 1 14: d b i 2 1 1 2 15: d c i 3 1 1 3 16: d e i 4 1 1 4 17: e a f 3 6 2 1 18: e b f 4 3 3 2 19: e c f 5 6 2 3 20: e d f 6 3 3 4

CenterCrop

非常感谢任何帮助。

我设法让这个工作:

click events

但它独立地处理var1和var2。我不想要的。

3 个答案:

答案 0 :(得分:1)

因此,您的问题更多是算法问题,因此我们将使用循环而不是 dplyr data.table 。对我而言,在R opten中使用循环意味着使用Rcpp。所以这是我的答案:

// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/foreach.hpp>
using namespace Rcpp;

// the C-style upper-case macro name is a bit ugly
#define foreach BOOST_FOREACH

// [[Rcpp::export]]
ListOf<IntegerVector> new_vars(const IntegerVector& Var1,
                               const IntegerVector& Var2,
                               int n_Var,
                               ListOf<IntegerVector> ind_groups) {

  int nrow = Var1.size();
  IntegerVector new_var1a(nrow, NA_INTEGER); 
  IntegerVector new_var2a(nrow, NA_INTEGER); 

  for (int i = 0; i < ind_groups.size(); i++) {
    IntegerVector counts(n_Var);
    foreach(const int& j, ind_groups[i]) {
      new_var1a[j] = ++counts[Var1[j]];
      new_var2a[j] = ++counts[Var2[j]];
    }
  }

  return List::create(Named("new_var1a") = new_var1a, 
                      Named("new_var2a") = new_var2a);
}


/*** R
x <- expand.grid(letters[1:5],letters[1:5],
                 KEEP.OUT.ATTRS = FALSE, 
                 stringsAsFactors = FALSE)
x <- x[x[,1]!=x[,2],c(2,1)]
x <- data.frame(x,group=as.character(rep(letters[c(1,2,1,4,1)+5],each=4)))
x <- data.frame(x,new_var1 = c(1,2,3,4,1,2,3,4,2,3,4,5,1,2,3,4,3,4,5,6))
x <- data.frame(x,new_var2 = c(1,1,1,1,1,1,1,1,5,2,2,2,1,1,1,1,6,3,6,3))


getNewVars <- function(x) {

  Vars.levels <- unique(c(x$Var2, x$Var1))

  new_vars <- new_vars(
      Var1 = match(x$Var1, Vars.levels) - 1,
      Var2 = match(x$Var2, Vars.levels) - 1,
      n_Var = length(Vars.levels), 
      ind_groups = split(seq_along(x$group) - 1, x$group)
  )

  cbind(x, new_vars)
}

getNewVars(x)
*/

将它放在“.cpp”文件中并获取它。

PS:请务必使用stringsAsFactors = FALSE

答案 1 :(得分:1)

使用dplyr解决方案,首先将数据从宽格式转换为长格式,同时保持行ID以便稍后再次合并。

示例数据

MAX_SIZE

代码

df = read.table(text="  Var2 Var1 group new_var2 new_var1
    a    b     f        1        1
    a    c     f        2        1
    a    d     f        3        1
    a    e     f        4        1
    b    a     g        1        1
    b    c     g        2        1
    b    d     g        3        1
    b    e     g        4        1
    c    a     f        2        5
    c    b     f        3        2
    c    d     f        4        2
    c    e     f        5        2
    d    a     i        1        1
    d    b     i        2        1
    d    c     i        3        1
    d    e     i        4        1
    e    a     f        3        6
    e    b     f        4        3
    e    c     f        5        6
    e    d     f        6        3",header=T)

df = df[,c("Var2","Var1","group")]

如果一行可以包含相同的变量(在您的示例中不是这种情况),可以选择添加library(reshape2) library(dplyr) df$id = seq(1,nrow(df)) df2 = melt(df, id.vars=c("id", "group")) %>% arrange(id) df2 = df2 %>% group_by(group,value) %>% mutate(n= row_number()) df = df %>% left_join(df2[df2$variable=="Var1",c("id","n")], by="id") df = df %>% left_join(df2[df2$variable=="Var2",c("id","n")], by="id") colnames(df)[colnames(df)=="n.x"]="new_var1" colnames(df)[colnames(df)=="n.y"]="new_var2"

输出

df2 = df2 %>% group_by(group,value,id) %>% mutate(n=max(n))

希望这有帮助!

答案 2 :(得分:1)

dcast()包中的data.table功能允许我们reshape multiple value variables simultaneously。这可用于避免Florian's answer中的双左连接:

library(data.table)
long <- melt(setDT(x)[, rn := .I], id.vars = c("rn", "group"), 
             measure.vars = c("Var1", "Var2"), value.name = "Var")[
               , variable := rleid(variable)][
                 order(rn), new_var := rowid(group, Var)][]
dcast(long, rn + group ~ ..., value.var = c("Var", "new_var"))[, rn := NULL][]
    group Var_1 Var_2 new_var_1 new_var_2
 1:     f     b     a         1         1
 2:     f     c     a         1         2
 3:     f     d     a         1         3
 4:     f     e     a         1         4
 5:     g     a     b         1         1
 6:     g     c     b         1         2
 7:     g     d     b         1         3
 8:     g     e     b         1         4
 9:     f     a     c         5         2
10:     f     b     c         2         3
11:     f     d     c         2         4
12:     f     e     c         2         5
13:     i     a     d         1         1
14:     i     b     d         1         2
15:     i     c     d         1         3
16:     i     e     d         1         4
17:     f     a     e         6         3
18:     f     b     e         3         4
19:     f     c     e         6         5
20:     f     d     e         3         6

解释

setDT(x)强制xdata.table,然后在从宽格式转换为长格式之前添加包含行号的列。只是为了从随后的dcast()中获得更好看的列名,重命名变量(对于此[, variable := sub("Var", "", variable)],可以用作[, variable := rleid(variable)]的替代。)

重要的一步是使用按Vargroup分组的rowid(),在每个group内对每个Var的出现次数进行编号。

现在,结果有两个值列。最后,它再次从长格式转换为宽格式,并且不再需要删除rn列。

数据

x <- expand.grid(letters[1:5], letters[1:5], KEEP.OUT.ATTRS = FALSE)
x <- x[x[, 1] != x[, 2], c(2, 1)]
x <- data.frame(
  x, 
  group = as.character(rep(letters[c(1, 2, 1, 4, 1) + 5], each = 4)),
  new_var1 = c(1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 5, 1, 2, 3, 4, 3, 4, 5, 6),
  new_var2 = c(1, 1, 1, 1, 1, 1, 1, 1, 5, 2, 2, 2, 1, 1, 1, 1, 6, 3, 6, 3))