如果他们有x组,我正试图对变量一或二进行滚动计数。
基本上我希望在此示例中返回new_var1
和new_var2
,其中每次Var1
或 Var2
都有a
组合f
1}}和小组b
重要,或f
和小组a
等等。因此,无论a
列Var1
或Var2
中是否显示a
,每个组中Var1
的整体外观都会计算在内。但是,必须将计数分配给正确的列。因此,如果new_var1
列中显示a
,则必须将实际计数分配给列Var2
。因此,对于new_var2
中的x <- expand.grid(letters[1:5],letters[1:5],KEEP.OUT.ATTRS = FALSE)
x <- x[x[,1]!=x[,2],c(2,1)]
x <- data.frame(x,group=as.character(rep(letters[c(1,2,1,4,1)+5],each=4)))
x<- data.frame(x,new_var1 = c(1,2,3,4,1,2,3,4,2,3,4,5,1,2,3,4,3,4,5,6))
x<- data.frame(x,new_var2 = c(1,1,1,1,1,1,1,1,5,2,2,2,1,1,1,1,6,3,6,3))`
Var2 Var1 group new_var2 new_var1
a b f 1 1
a c f 2 1
a d f 3 1
a e f 4 1
b a g 1 1
b c g 2 1
b d g 3 1
b e g 4 1
c a f 2 5
c b f 3 2
c d f 4 2
c e f 5 2
d a i 1 1
d b i 2 1
d c i 3 1
d e i 4 1
e a f 3 6
e b f 4 3
e c f 5 6
e d f 6 3
,实际计数应为x <- data.table(x)
x[, new_var1a := seq(.N) , by = c('Var1','group')]
x[, new_var2a := seq(.N) , by = c('Var2','group')]
Var2 Var1 group new_var2 new_var1 new_var1a new_var2a
1: a b f 1 1 1 1
2: a c f 2 1 1 2
3: a d f 3 1 1 3
4: a e f 4 1 1 4
5: b a g 1 1 1 1
6: b c g 2 1 1 2
7: b d g 3 1 1 3
8: b e g 4 1 1 4
9: c a f 2 5 1 1
10: c b f 3 2 2 2
11: c d f 4 2 2 3
12: c e f 5 2 2 4
13: d a i 1 1 1 1
14: d b i 2 1 1 2
15: d c i 3 1 1 3
16: d e i 4 1 1 4
17: e a f 3 6 2 1
18: e b f 4 3 3 2
19: e c f 5 6 2 3
20: e d f 6 3 3 4
。
CenterCrop
非常感谢任何帮助。
我设法让这个工作:
click events
但它独立地处理var1和var2。我不想要的。
答案 0 :(得分:1)
因此,您的问题更多是算法问题,因此我们将使用循环而不是 dplyr 或 data.table 。对我而言,在R opten中使用循环意味着使用Rcpp。所以这是我的答案:
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/foreach.hpp>
using namespace Rcpp;
// the C-style upper-case macro name is a bit ugly
#define foreach BOOST_FOREACH
// [[Rcpp::export]]
ListOf<IntegerVector> new_vars(const IntegerVector& Var1,
const IntegerVector& Var2,
int n_Var,
ListOf<IntegerVector> ind_groups) {
int nrow = Var1.size();
IntegerVector new_var1a(nrow, NA_INTEGER);
IntegerVector new_var2a(nrow, NA_INTEGER);
for (int i = 0; i < ind_groups.size(); i++) {
IntegerVector counts(n_Var);
foreach(const int& j, ind_groups[i]) {
new_var1a[j] = ++counts[Var1[j]];
new_var2a[j] = ++counts[Var2[j]];
}
}
return List::create(Named("new_var1a") = new_var1a,
Named("new_var2a") = new_var2a);
}
/*** R
x <- expand.grid(letters[1:5],letters[1:5],
KEEP.OUT.ATTRS = FALSE,
stringsAsFactors = FALSE)
x <- x[x[,1]!=x[,2],c(2,1)]
x <- data.frame(x,group=as.character(rep(letters[c(1,2,1,4,1)+5],each=4)))
x <- data.frame(x,new_var1 = c(1,2,3,4,1,2,3,4,2,3,4,5,1,2,3,4,3,4,5,6))
x <- data.frame(x,new_var2 = c(1,1,1,1,1,1,1,1,5,2,2,2,1,1,1,1,6,3,6,3))
getNewVars <- function(x) {
Vars.levels <- unique(c(x$Var2, x$Var1))
new_vars <- new_vars(
Var1 = match(x$Var1, Vars.levels) - 1,
Var2 = match(x$Var2, Vars.levels) - 1,
n_Var = length(Vars.levels),
ind_groups = split(seq_along(x$group) - 1, x$group)
)
cbind(x, new_vars)
}
getNewVars(x)
*/
将它放在“.cpp”文件中并获取它。
PS:请务必使用stringsAsFactors = FALSE
。
答案 1 :(得分:1)
使用dplyr解决方案,首先将数据从宽格式转换为长格式,同时保持行ID以便稍后再次合并。
示例数据
MAX_SIZE
代码
df = read.table(text=" Var2 Var1 group new_var2 new_var1
a b f 1 1
a c f 2 1
a d f 3 1
a e f 4 1
b a g 1 1
b c g 2 1
b d g 3 1
b e g 4 1
c a f 2 5
c b f 3 2
c d f 4 2
c e f 5 2
d a i 1 1
d b i 2 1
d c i 3 1
d e i 4 1
e a f 3 6
e b f 4 3
e c f 5 6
e d f 6 3",header=T)
df = df[,c("Var2","Var1","group")]
如果一行可以包含相同的变量(在您的示例中不是这种情况),可以选择添加library(reshape2)
library(dplyr)
df$id = seq(1,nrow(df))
df2 = melt(df, id.vars=c("id", "group")) %>% arrange(id)
df2 = df2 %>% group_by(group,value) %>% mutate(n= row_number())
df = df %>% left_join(df2[df2$variable=="Var1",c("id","n")], by="id")
df = df %>% left_join(df2[df2$variable=="Var2",c("id","n")], by="id")
colnames(df)[colnames(df)=="n.x"]="new_var1"
colnames(df)[colnames(df)=="n.y"]="new_var2"
。
输出
df2 = df2 %>% group_by(group,value,id) %>% mutate(n=max(n))
希望这有帮助!
答案 2 :(得分:1)
dcast()
包中的data.table
功能允许我们reshape multiple value variables simultaneously。这可用于避免Florian's answer中的双左连接:
library(data.table)
long <- melt(setDT(x)[, rn := .I], id.vars = c("rn", "group"),
measure.vars = c("Var1", "Var2"), value.name = "Var")[
, variable := rleid(variable)][
order(rn), new_var := rowid(group, Var)][]
dcast(long, rn + group ~ ..., value.var = c("Var", "new_var"))[, rn := NULL][]
group Var_1 Var_2 new_var_1 new_var_2 1: f b a 1 1 2: f c a 1 2 3: f d a 1 3 4: f e a 1 4 5: g a b 1 1 6: g c b 1 2 7: g d b 1 3 8: g e b 1 4 9: f a c 5 2 10: f b c 2 3 11: f d c 2 4 12: f e c 2 5 13: i a d 1 1 14: i b d 1 2 15: i c d 1 3 16: i e d 1 4 17: f a e 6 3 18: f b e 3 4 19: f c e 6 5 20: f d e 3 6
setDT(x)
强制x
到data.table
,然后在从宽格式转换为长格式之前添加包含行号的列。只是为了从随后的dcast()
中获得更好看的列名,重命名变量(对于此[, variable := sub("Var", "", variable)]
,可以用作[, variable := rleid(variable)]
的替代。)
重要的一步是使用按Var
和group
分组的rowid()
,在每个group
内对每个Var
的出现次数进行编号。
现在,结果有两个值列。最后,它再次从长格式转换为宽格式,并且不再需要删除rn
列。
x <- expand.grid(letters[1:5], letters[1:5], KEEP.OUT.ATTRS = FALSE)
x <- x[x[, 1] != x[, 2], c(2, 1)]
x <- data.frame(
x,
group = as.character(rep(letters[c(1, 2, 1, 4, 1) + 5], each = 4)),
new_var1 = c(1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 5, 1, 2, 3, 4, 3, 4, 5, 6),
new_var2 = c(1, 1, 1, 1, 1, 1, 1, 1, 5, 2, 2, 2, 1, 1, 1, 1, 6, 3, 6, 3))