Question

例如，我有一个包含两个因子变量和1000行的数据帧。我想通过返回给每5行最频繁出现的水平的向量，将观察次数减少到200次。

Prompt A,B,C
B²-4AC
If Ans<0
Disp "No Real Solutions
If not(Ans
Disp "One Solution",-B/2/A
If D>0
Then
(-√(D)-B)/2/A→E
(√(D)-B)/2/A→F
End

我希望输出给出两列如下：

 df <- data.frame(test=factor(sample(c("A","B", "C" ),1000,replace=TRUE)))
 df$test2 <- factor(sample(c("dog", "cat", "fish"), 1000, replace=TRUE))
 head(df, 15)

     test test2
1     C  fish
2     B   dog
3     A  fish
4     B  fish
5     B   dog
6     A   cat
7     B   cat
8     C  fish
9     C  fish
10    C   cat
11    B   dog
12    A  fish
13    B   dog
14    B   cat
15    C   dog

我找到了一些示例，其中最常见的类别是在一行中的列之间找到但不是向下列和行数。提前感谢任何建议。非常感谢

Answer 1

我们可以尝试使用data.table。转换＆＃39; data.frame＆＃39;到＆＃39; data.table＆＃39; （setDT(df)），按＆＃39; test＆＃39;，＆＃39; test2＆＃39;分组和一个通过复制200乘5（＆＃39; grp＆＃39;）的序列创建的变量，按＆＃39; grp＆＃39;分组，我们得到Data.table的子集（.SD）其中＆＃39; N＆＃39;是最大值（which.max(N)）。如果需要，我们可以分配＆＃39; grp＆＃39;和＆＃39; N＆＃39;列到＆＃39; NULL＆＃39;。

library(data.table)
res <- setDT(df)[, .N, by = .(test, test2, grp = rep(1:200, each = 5))
             ][, .SD[which.max(N)], by = grp][, c("grp", "N") := NULL][]
dim(res)
#[1] 200   2

由于OP没有使用set.seed来创建sample，因此输出将不同。通过使用OP的帖子中显示的前15行

setnames(setDT(df1)[, .N, by = .(test, test2, grp= rep(1:3, each = 5))
   ][, .SD[which.max(N)] , grp][,  c("grp", "N") := NULL][], paste0(names(df1), "ANS"))[]
#    testANS test2ANS
#1:       B      dog
#2:       C     fish
#3:       B      dog

更新

根据评论，列频率似乎应该单独进行

setDT(df1)[,  grp:= rep(1:3, each = 5)][,
     testN := .N ,by = .(grp, test)][, test2N := .N, by = .(grp, test2)
       ][, .(testANS = test[which.max(testN)], test2ANS = test2[which.max(test2N)]), grp]
#   grp testANS  test2ANS
#1:   1       B      fish
#2:   2       C       cat
#3:   3       B       dog

注意：在原始数据集中，将rep(1:3, each = 5)更改为rep(1:200, each = 5)

数据

df1 <- structure(list(test = c("C", "B", "A", "B", "B", "A", "B", "C", 
"C", "C", "B", "A", "B", "B", "C"), test2 = c("fish", "dog", 
"fish", "fish", "dog", "cat", "cat", "fish", "fish", "cat", "dog", 
"fish", "dog", "cat", "dog")), .Names = c("test", "test2"),
 class = "data.frame", row.names = c(NA, -15L))

返回每n行多数因子水平的向量/ R中的观察值

1 个答案:

更新

数据