按行排序

时间:2014-05-22 12:48:52

标签: r sorting combinatorics

在R中我有一个组合类型问题。我有具有唯一序列的列,并且每个序列都有一个值。

df<-cbind(data.table(rbind(
    c("A","B","C","D"), 
    c("A","C","D","B"), 
    c("A","D","B","C"), 
    c("A","C","B","D"), 
    c("A","B","D","C"), 
    c("A","D","C","B"), 
    c("A","B","D","C"), 
    c("A","D","C","B"), 
    c("A","C","B","D"), 
    c("E","B","C","D"), 
    c("E","C","D","B"), 
    c("E","D","B","C"), 
    c("E","C","B","D"), 
    c("E","B","D","C"), 
    c("E","D","C","B"), 
    c("E","B","D","C"), 
    c("E","D","C","B"), 
    c("E","C","B","D"))),
c(55,54,86,109,23,41,53,54,88,54,53,85,108,22,40,52,53,87))

> df
    V1 V2 V3 V4  V2
 1:  A  B  C  D  55
 2:  A  C  D  B  54
 3:  A  D  B  C  86
 4:  A  C  B  D 109
 5:  A  B  D  C  23
 6:  A  D  C  B  41
 7:  A  B  D  C  53
 8:  A  D  C  B  54
 9:  A  C  B  D  88
10:  E  B  C  D  54
11:  E  C  D  B  53
12:  E  D  B  C  85
13:  E  C  B  D 108
14:  E  B  D  C  22
15:  E  D  C  B  40
16:  E  B  D  C  52
17:  E  D  C  B  53
18:  E  C  B  D  87

输出必须是第4行和第13行

> df[c(4,13),]
   V1 V2 V3 V4  V2
1:  A  C  B  D 109
2:  E  C  B  D 108

我需要选择具有最大值的唯一序列。 我已经考虑过按行将列排序成一个单词,然后选择该单词的最大值,但此时我正在使用1200万行。

4 个答案:

答案 0 :(得分:1)

我认为您应首先创建一个汇总4个因子列的ID。然后通过新创建的id获取最大值。

关于数据的大小,您应该使用data.table包。

library(data.table)
DT <- as.data.table(df)
setnames(DT,5,'value')   ## just rename the column
DT[,id:=rowSums(DT[,lapply(.SD[,-5,with=F],as.integer)])][,.SD[which.max(value)],id]
  id V1 V2 V3 V4 value
1:  7  A  C  B  D   109
2:  8  E  C  B  D   108

答案 1 :(得分:0)

看到这个传球 - 我在R中并不是很方便,所以英语:

快速思考

  • 排序是不必要的开销。您只需要最大的值
  • 在当前行大于目前的最大值时,在更新结果变量时逐行评估数据。
  • 这种方式只需要传递一次数据。
  • 如果您需要两个最大值,请使用相同的策略,但使用两个结果变量(一个用于max,一个用于第二个)。
  • 如果可能相等的值,则将结果设为长度> 1的矢量。

所以不要排序,但找到最大值: - )

成功

威廉。

答案 2 :(得分:0)

下面的代码会产生你的结果,但我确信还有其他几种方法可以做到这一点。我已经将最后一列更改为V5 - 您有两个名为V2的列吗?

您对最大值还是两个最大值感兴趣?

如果您有兴趣,下面的代码会为您提供与2个最大值对应的行:

dimnames(df)[[2]] <- c("V1", "V2", "V3", "V4", "V5")

df[df$V5 == tail(sort(df$V5), 2), ]

   V1 V2 V3 V4  V5
1:  A  C  B  D 109
2:  E  C  B  D 108

下面的代码为您提供了具有最大值的行(如果那就是您所追求的):

df[df$V5 == max(df$V5), ]

    V1 V2 V3 V4  V5
1:  A  C  B  D  109

答案 3 :(得分:0)

避免使用排序;它确实如此,方式超出了你的需要。由于这是R,所以也要避免循环。 使用矩阵和&#34; max&#34;。

thisMat <- cbind(t(matrix(c(c("A","B","C","D"),
    c("A","C","D","B"), 
    c("A","D","B","C"), 
    c("A","C","B","D"), 
    c("A","B","D","C"), 
    c("A","D","C","B"), 
    c("A","B","D","C"), 
    c("A","D","C","B"), 
    c("A","C","B","D"), 
    c("E","B","C","D"), 
    c("E","C","D","B"), 
    c("E","D","B","C"), 
    c("E","C","B","D"), 
    c("E","B","D","C"), 
    c("E","D","C","B"), 
    c("E","B","D","C"), 
    c("E","D","C","B"), 
    c("E","C","B","D")),nrow=4)),c(55,54,86,109,23,41,53,54,88,54,53,85,108,22,40,52,53,87))

thatMat[thatMat[,5]==max(as.numeric(thatMat[,5])),]
     [,1] [,2] [,3] [,4] [,5] 
[1,] "A"  "C"  "B"  "D"  "109"