返回每n行多数因子水平的向量/ R中的观察值

时间:2016-09-18 17:24:21

标签: r rows mode

例如,我有一个包含两个因子变量和1000行的数据帧。我想通过返回给每5行最频繁出现的水平的向量,将观察次数减少到200次。

Prompt A,B,C
B²-4AC
If Ans<0
Disp "No Real Solutions
If not(Ans
Disp "One Solution",-B/2/A
If D>0
Then
(-√(D)-B)/2/A→E
(√(D)-B)/2/A→F
End

我希望输出给出两列如下:

 df <- data.frame(test=factor(sample(c("A","B", "C" ),1000,replace=TRUE)))
 df$test2 <- factor(sample(c("dog", "cat", "fish"), 1000, replace=TRUE))
 head(df, 15)

     test test2
1     C  fish
2     B   dog
3     A  fish
4     B  fish
5     B   dog
6     A   cat
7     B   cat
8     C  fish
9     C  fish
10    C   cat
11    B   dog
12    A  fish
13    B   dog
14    B   cat
15    C   dog

我找到了一些示例,其中最常见的类别是在一行中的列之间找到但不是向下列和行数。提前感谢任何建议。非常感谢

1 个答案:

答案 0 :(得分:0)

我们可以尝试使用data.table。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df)),按&#39; test&#39;,&#39; test2&#39;分组和一个通过复制200乘5(&#39; grp&#39;)的序列创建的变量,按&#39; grp&#39;分组,我们得到Data.table的子集(.SD)其中&#39; N&#39;是最大值(which.max(N))。如果需要,我们可以分配&#39; grp&#39;和&#39; N&#39;列到&#39; NULL&#39;。

library(data.table)
res <- setDT(df)[, .N, by = .(test, test2, grp = rep(1:200, each = 5))
             ][, .SD[which.max(N)], by = grp][, c("grp", "N") := NULL][]
dim(res)
#[1] 200   2

由于OP没有使用set.seed来创建sample,因此输出将不同。通过使用OP的帖子中显示的前15行

setnames(setDT(df1)[, .N, by = .(test, test2, grp= rep(1:3, each = 5))
   ][, .SD[which.max(N)] , grp][,  c("grp", "N") := NULL][], paste0(names(df1), "ANS"))[]
#    testANS test2ANS
#1:       B      dog
#2:       C     fish
#3:       B      dog

更新

根据评论,列频率似乎应该单独进行

setDT(df1)[,  grp:= rep(1:3, each = 5)][,
     testN := .N ,by = .(grp, test)][, test2N := .N, by = .(grp, test2)
       ][, .(testANS = test[which.max(testN)], test2ANS = test2[which.max(test2N)]), grp]
#   grp testANS  test2ANS
#1:   1       B      fish
#2:   2       C       cat
#3:   3       B       dog

注意:在原始数据集中,将rep(1:3, each = 5)更改为rep(1:200, each = 5)

数据

df1 <- structure(list(test = c("C", "B", "A", "B", "B", "A", "B", "C", 
"C", "C", "B", "A", "B", "B", "C"), test2 = c("fish", "dog", 
"fish", "fish", "dog", "cat", "cat", "fish", "fish", "cat", "dog", 
"fish", "dog", "cat", "dog")), .Names = c("test", "test2"),
 class = "data.frame", row.names = c(NA, -15L))