在R中我有一个组合类型问题。我有具有唯一序列的列,并且每个序列都有一个值。
df<-cbind(data.table(rbind(
c("A","B","C","D"),
c("A","C","D","B"),
c("A","D","B","C"),
c("A","C","B","D"),
c("A","B","D","C"),
c("A","D","C","B"),
c("A","B","D","C"),
c("A","D","C","B"),
c("A","C","B","D"),
c("E","B","C","D"),
c("E","C","D","B"),
c("E","D","B","C"),
c("E","C","B","D"),
c("E","B","D","C"),
c("E","D","C","B"),
c("E","B","D","C"),
c("E","D","C","B"),
c("E","C","B","D"))),
c(55,54,86,109,23,41,53,54,88,54,53,85,108,22,40,52,53,87))
> df
V1 V2 V3 V4 V2
1: A B C D 55
2: A C D B 54
3: A D B C 86
4: A C B D 109
5: A B D C 23
6: A D C B 41
7: A B D C 53
8: A D C B 54
9: A C B D 88
10: E B C D 54
11: E C D B 53
12: E D B C 85
13: E C B D 108
14: E B D C 22
15: E D C B 40
16: E B D C 52
17: E D C B 53
18: E C B D 87
输出必须是第4行和第13行
> df[c(4,13),]
V1 V2 V3 V4 V2
1: A C B D 109
2: E C B D 108
我需要选择具有最大值的唯一序列。 我已经考虑过按行将列排序成一个单词,然后选择该单词的最大值,但此时我正在使用1200万行。
答案 0 :(得分:1)
我认为您应首先创建一个汇总4个因子列的ID。然后通过新创建的id获取最大值。
关于数据的大小,您应该使用data.table
包。
library(data.table)
DT <- as.data.table(df)
setnames(DT,5,'value') ## just rename the column
DT[,id:=rowSums(DT[,lapply(.SD[,-5,with=F],as.integer)])][,.SD[which.max(value)],id]
id V1 V2 V3 V4 value
1: 7 A C B D 109
2: 8 E C B D 108
答案 1 :(得分:0)
看到这个传球 - 我在R中并不是很方便,所以英语:
快速思考
所以不要排序,但找到最大值: - )
成功
威廉。
答案 2 :(得分:0)
下面的代码会产生你的结果,但我确信还有其他几种方法可以做到这一点。我已经将最后一列更改为V5 - 您有两个名为V2的列吗?
您对最大值还是两个最大值感兴趣?
如果您有兴趣,下面的代码会为您提供与2个最大值对应的行:
dimnames(df)[[2]] <- c("V1", "V2", "V3", "V4", "V5")
df[df$V5 == tail(sort(df$V5), 2), ]
V1 V2 V3 V4 V5
1: A C B D 109
2: E C B D 108
下面的代码为您提供了具有最大值的行(如果那就是您所追求的):
df[df$V5 == max(df$V5), ]
V1 V2 V3 V4 V5
1: A C B D 109
答案 3 :(得分:0)
避免使用排序;它确实如此,方式超出了你的需要。由于这是R,所以也要避免循环。 使用矩阵和&#34; max&#34;。
thisMat <- cbind(t(matrix(c(c("A","B","C","D"),
c("A","C","D","B"),
c("A","D","B","C"),
c("A","C","B","D"),
c("A","B","D","C"),
c("A","D","C","B"),
c("A","B","D","C"),
c("A","D","C","B"),
c("A","C","B","D"),
c("E","B","C","D"),
c("E","C","D","B"),
c("E","D","B","C"),
c("E","C","B","D"),
c("E","B","D","C"),
c("E","D","C","B"),
c("E","B","D","C"),
c("E","D","C","B"),
c("E","C","B","D")),nrow=4)),c(55,54,86,109,23,41,53,54,88,54,53,85,108,22,40,52,53,87))
thatMat[thatMat[,5]==max(as.numeric(thatMat[,5])),]
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "C" "B" "D" "109"