Question

我的数据框设置如下：

Black      White       Red       Blue
 0.8        0.1        0.07      0.03
 0.3        0.6         0        0.1
 0.1        0.6        0.25      0.05

我希望我的数据框看起来像这样：

Black      White       Red       Blue     Color1     Color2     Color3    Color4
 0.8        0.1        0.07      0.03      0.8        0.1        0.07      0.03 
 0.3        0.6         0        0.1       0.6        0.3         0.1        0
 0.1        0.6        0.25      0.05      0.6        0.25        0.1      0.05

其中Color1表示每行的最大值，Color2表示第二大值，Color3表示第三大值，Color4表示每行的最小值。

到目前为止，我已经使用这个函数来获得我想要的东西，这是上面的结果：

maxn <- function(n) function(x) order(x, decreasing = TRUE)[n]
df$Color1 <- apply(df, 1, max)
df$Color2 <- apply(df, 1, function(x)x[maxn(3)(x)])
df$Color3 <- apply(df, 1, function(x)x[maxn(4)(x)])
df$Color4 <- apply(df, 1, function(x)x[maxn(5)(x)])

我是否有更简洁的方式来安排我的数据集？

此外，有点偏离主题：我不确定是不是因为这是我正在使用的CSV文件，无论何时我使用该功能

df$Color2 <- apply(df, 1, function(x)x[maxn(2)(x)])

它将返回与函数

相同的结果

apply(df, 1, max)

和

apply(df, 1, function(x)x[maxn(1)(x)])

Answer 1

一种选择是将sort与apply，transpose一起使用，然后将cbind与数据框一起使用：

cbind(df, t(apply(df, 1, sort, decreasing = TRUE)))

#   Black White  Red Blue   1    2    3    4
# 1   0.8   0.1 0.07 0.03 0.8 0.10 0.07 0.03
# 2   0.3   0.6 0.00 0.10 0.6 0.30 0.10 0.00
# 3   0.1   0.6 0.25 0.05 0.6 0.25 0.10 0.05

更新：根据@dww列名的建议，可以指定为：

df[paste0('color',1:4)] = t(apply(df, 1, sort, decreasing = TRUE))

# Black White  Red Blue color1 color2 color3 color4
# 1   0.8   0.1 0.07 0.03    0.8   0.10   0.07   0.03
# 2   0.3   0.6 0.00 0.10    0.6   0.30   0.10   0.00
# 3   0.1   0.6 0.25 0.05    0.6   0.25   0.10   0.05

Answer 2

它要复杂得多，但如果您处理大量行，则更快速的解决方案是仅进行一次排序/排序并将其重新插入矩阵形状：< / p>

matrix(x[order(-row(x), x, decreasing=TRUE)], nrow=nrow(x), ncol=ncol(x), byrow=TRUE)

一些时间：

x <- matrix(rnorm(300000*5), nrow=300000, ncol=5)
system.time(t(apply(x, 1, sort, decreasing=TRUE)))
#   user  system elapsed 
#  14.13    0.00   14.13 
system.time(
  matrix(x[order(-row(x),x, decreasing=TRUE)], nrow=nrow(x), ncol=ncol(x), byrow=TRUE)
)
#   user  system elapsed 
#   0.10    0.00    0.09

将每行从R中的最大值重新排列为最小值

2 个答案: