以下是我在分析中使用的sample数据。我需要做的是使用列名为每个行提取前3个值。例如,这将是前3行的输出:
id, group1, weight1, group2, weight2, group3, weight3
1, V4, 0.277991043, V10, 0.050863724, V2, 0.033589251
2, V5, 0.164107486, V4, 0.119961612, V3, 0.098208573
3, V3, 0.124760077, V5, 0.089891235, V2, 0.071337172
最简单的方法是什么?
答案 0 :(得分:2)
这是另一种能够使数据保持整洁格式的想法:
library(dplyr)
library(tidyr)
sample %>%
gather(key, value, -node) %>%
group_by(node) %>%
top_n(3) %>%
# here we use arrange() to sort by node and value
arrange(node, desc(value))
给出了:
#Source: local data frame [75 x 3]
#Groups: node [25]
#
# node key value
# <int> <chr> <dbl>
#1 1 V4 0.27799104
#2 1 V10 0.05086372
#3 1 V2 0.03358925
#4 2 V5 0.16410749
#5 2 V4 0.11996161
#6 2 V3 0.09820857
#7 3 V3 0.12476008
#8 3 V5 0.08989123
#9 3 V2 0.07133717
#10 4 V6 0.20665387
#.. ... ... ...
如果你真的想要达到你想要的输出,你可以这样做:
sample %>%
gather(key, value, -node) %>%
group_by(node) %>%
top_n(3) %>%
arrange(node, desc(value)) %>%
mutate(group = paste0("group", row_number()),
weight = paste0("weight", row_number())) %>%
spread(group, value) %>%
spread(weight, key) %>%
summarise_each(funs(max(., na.rm = TRUE)))
给出了:
#Source: local data frame [25 x 7]
#
# node group1 group2 group3 weight1 weight2 weight3
# <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#1 1 0.2779910 0.05086372 0.033589251 V4 V10 V2
#2 2 0.1641075 0.11996161 0.098208573 V5 V4 V3
#3 3 0.1247601 0.08989123 0.071337172 V3 V5 V2
#4 4 0.2066539 0.14747281 0.121561100 V6 V2 V10
#5 5 0.2773512 0.21849008 0.158989123 V1 V8 V3
#6 6 0.1509917 0.11964171 0.117722329 V9 V3 V10
#7 7 0.2415227 0.13595649 0.130838132 V9 V7 V8
#8 8 0.1090851 0.10588612 0.088611644 V9 V7 V5
#9 9 0.1868202 0.11548305 0.089571337 V10 V1 V6
#10 10 0.3429303 0.12955854 0.003838772 V5 V6 V11
#.. ... ... ... ... ... ... ...
答案 1 :(得分:0)
我们可以使用apply
res <- cbind(df1[1], t(apply(df1[-1], 1, function(x) {
i1 <- order(-x)
c(rbind(names(df1)[-1][i1][1:3], x[i1][1:3]))}
)))
然后,我们可以进行类型转换
res[] <- lapply(res, function(x) {x1 <- type.convert(as.character(x))
if(is.factor(x1)) as.character(x1) else x1})
names(res)[-1] <- make.unique(rep(c("group", "weight"), (ncol(res)-1)/2))