Question

我有一个带有原始回答的大数据，想要将第1组中主题1的每个元素与第2组中主题1的相应元素进行比较。当然，需要在第1组中的主题2和第2组中的主题2，第1组中的主题3和第2组中的主题3之间，依此类推。使问题复杂的原因是有100个组，而这些组又是50个配对组。如果输出相同，则输出需要保留原始原始响应。如果它们不同，则原始响应需要替换为“9”。

我很确定我可以用for循环来做，但是想知道r中是否有比for-loop更好的东西，比如ifelse或者应用？

为简化数据，它看起来如下所示。

df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)

感谢您的帮助。

Answer 1

   V1 V2 V3 V4 V5 subject group
1   3  3  3  4  5       1     1
2   4  4  3  1  3       2     1
3   3  2  2  4  2       3     1
4   4  4  3  5  3       1     2
5   3  2  1  5  1       2     2
6   2  5  4  4  1       3     2
7   3  2  3  2  2       1     3
8   1  2  3  3  3       2     3
9   2  2  2  2  5       3     3
10  3  3  3  5  4       1     4
11  5  3  5  4  2       2     4
12  5  3  1  1  3       3     4

＆GT; DF

#processing without for loop
# assumption: initial data is sorted by group (can be easily done)

coloumns<-!dimnames(x)[[2]] %in% c('group','subject');
subjects<-df[, 'subject']
tabl<-table(subjects)
rows<-order(subjects)
rows2<-cumsum(tabl)
rows1<-rows2-tabl+1
df[rows[-rows1],coloumns][df[rows[-rows1],coloumns]!=df[rows[-rows2],coloumns]]<-9

没有for循环的处理

    V1 V2 V3 V4 V5 subject group
1   3  3  3  4  5       1     1
2   4  4  3  1  3       2     1
3   3  2  2  4  2       3     1
4   9  9  3  9  9       1     2
5   9  9  9  9  9       2     2
6   9  9  9  4  9       3     2
7   9  9  3  9  9       1     3
8   9  2  9  9  9       2     3
9   2  9  9  9  9       3     3
10  3  9  3  9  9       1     4
11  9  9  9  9  9       2     4
12  9  9  9  9  9       3     4

＆GT; DF

{{1}}

Answer 2

以下是我为获得输出所做的工作。再次感谢Stanislav

df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)

> df
   V1 V2 V3 V4 V5 subject group
1   1  4  3  1  5       1     1
2   2  1  4  1  5       2     1
3   1  2  5  4  5       3     1
4   5  4  1  4  3       1     2
5   5  1  3  2  2       2     2
6   1  2  2  4  5       3     2
7   5  4  2  3  1       1     3
8   2  3  4  3  5       2     3
9   2  5  3  5  3       3     3
10  4  2  1  4  1       1     4
11  2  3  3  5  5       2     4
12  5  3  3  4  5       3     4

col<-!dimnames(df)[[2]] %in% c('subject','group')
n<-length(df[,1])
temp<-table(df$group)
n.sub<-temp[1]
temp<-seq(1,n,by=2*n.sub)
s1<-c(sapply(temp, function(x) seq.int(x, length.out=n.sub)))
temp<-seq(n.sub+1,n,by=2*n.sub)
s2<-c(sapply(temp, function(x) seq.int(x, length.out=n.sub)))

df[s2,col][df[s1,col]!=df[s2,col]]<-9

> df
   V1 V2 V3 V4 V5 subject group
1   1  4  3  1  5       1     1
2   2  1  4  1  5       2     1
3   1  2  5  4  5       3     1
4   9  4  9  9  9       1     2
5   9  1  9  9  9       2     2
6   1  2  9  4  5       3     2
7   5  4  2  3  1       1     3
8   2  3  4  3  5       2     3
9   2  5  3  5  3       3     3
10  9  9  9  9  1       1     4
11  2  3  9  9  5       2     4
12  9  9  3  9  9       3     4

比较大数据子集中的每个元素

2 个答案: