每列三个以上的值以及更多

时间:2017-07-26 14:44:28

标签: r

我的数据包含不同位置(x,y,z)和(a,b,c,d,e,f,g,h,i,j)之间的成对距离。见下文:

set.seed(123)
x <- rnorm(10, 15,1)
y <- rnorm(10, 7,0.1)
z <- rnorm(10, 3,0.01)

distdat <- data.frame(x,y,z)

rownames(distdat) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")

我需要创建另一个数据,包括:1)列名,min的行名和每列的最小三个值。总的来说,新数据将包含 三列九行。这是第一行:

col_name <- c("x", "x", "x")
row_name <- c("h", "g", "a")
min_val <- c(14.21208, 14.88804, 14.98797)

newdat <- data.frame(col_name, row_name, min_val)

同样,我们需要对y和z列重复此操作。

3 个答案:

答案 0 :(得分:2)

这个怎么样:

set.seed(123)
x <- rnorm(10, 15,1)
y <- rnorm(10, 7,0.1)
z <- rnorm(10, 3,0.01)

distdat <- data.frame(x,y,z)

rownames(distdat) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")

# find indices of smallest values
idx <- sapply(distdat, order)[1:3, ]

# put everything in a data.frame 
data.frame(col_name = rep(colnames(distdat), each = 3),
           row_name = row.names(distdat)[c(idx)],
           min_val = distdat[cbind(c(idx), rep(1:3, each = 3))]
)

另外,对于给定的种子,我无法复制你的例子,如果我错过了什么,请告诉我。

答案 1 :(得分:0)

它不是很漂亮,但这可行:

set.seed(123)
x <- rnorm(10, 15,1)
y <- rnorm(10, 7,0.1)
z <- rnorm(10, 3,0.01)

distdat <- data.frame(x,y,z)
rownames(distdat) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")

distdat$row_name <- rownames(distdat)

select(distdat, x, row_name) %>%
  arrange(x) %>% 
  head(3) %>% 
  mutate(col_name='x') %>%
  rename(min_val = x) -> newdat_x

select(distdat, y, row_name) %>%
  arrange(y) %>% 
  head(3) %>% 
  mutate(col_name='y') %>%
  rename(min_val = y) -> newdat_y

select(distdat, z, row_name) %>%
  arrange(z) %>% 
  head(3) %>% 
  mutate(col_name='z') %>%
  rename(min_val = z) -> newdat_z

newdat <- bind_rows(newdat_x, newdat_y, newdat_z)

当然,我们可以(应该)创建一个函数来创建newdat_dfs,然后为每个变量x,y,z运行该函数。

答案 2 :(得分:0)

您可以使用dplyrtidyr包来执行此操作。它们使转换更具可读性。

newdat <- distdat %>%
  mutate(row = rownames(.)) %>%
  gather(col, dist, -row) %>%
  group_by(col) %>%
  arrange(col, dist) %>%
  top_n(-3, dist)