排序数据框的列

时间:2015-04-11 12:34:49

标签: r sorting dataframe col

可能重复: How to sort a dataframe by column(s)?

我有一个不同顺序的数据框(就列而言)。

       ID_REF      VALUE       ABS_CALL           DETECTION.P.VALUE
1    10071_s_at 3473.60000        P/present       0.000219000
2       1053_at  643.20000        P/present       0.000673000
3        117_at  564.00000        M/Marginal      0.000322000
4     1255_g_at    9.40000        A/absent        0.602006000
5       1294_at  845.60000        P/present       0.000468000
6       1320_at   94.30000        A/absent        0.204022000

现在低于列的顺序

 VALUE      ID_REF     ABS_CALL           DETECTION P-VALUE
1  3473.6   10071_s_at  P/present          0.000219
2  643.2    1053_at     P/present          0.000673
3  564      117_at      M/marginal         0.000322
4  9.4      1255_g_at   A/absent           0.602006
5  845.6    1294_at     P/present          0.000468
6  94.3     1320_at     A/absent           0.204022

再次改变

   DETECTION P-VALUE    VALUE   ID_REF     ABS_CALL
1  0.000219             3473.6  10071_s_at  P
2  0.000673             643.2   1053_at     P
3  0.000322             564     117_at      M
4  0.602006             9.4     1255_g_at   A
5  0.000468             845.6   1294_at     P
6  0.204022             94.3    1320_at     A

这里我在不同的列顺序中有相同的数据框。我不知道数据框的顺序,所以我想把数据放在下面的格式中:

      ID_REF      VALUE       ABS_CALL           DETECTION.P.VALUE
1    10071_s_at 3473.60000        P/present       0.000219000
2       1053_at  643.20000        P/present       0.000673000
3        117_at  564.00000        M/Marginal      0.000322000
4     1255_g_at    9.40000        A/absent        0.602006000
5       1294_at  845.60000        P/present       0.000468000
6       1320_at   94.30000        A/absent        0.204022000

在这里,我需要检查任何列中是否存在_at的子字符串,然后将其作为第一列。如果任何列的值大于1,则将该列作为数据框的第二列。如果任何列的级别为PAMpresentabsentmarginal,则将其作为第三列,最后为任意列值小于1的列将是最后一列。任何人都可以告诉我如何有效地在R中这样做?

注意:列名不是永久性的,可以是任何名称(不同的名称)。

1 个答案:

答案 0 :(得分:2)

这无论如何都不优雅,但您可以编写一个只查看单行并搜索每个条件的函数:

dat1 <- read.table(header = TRUE, check.names = FALSE,
text="ID_REF      VALUE       ABS_CALL           DETECTION.P.VALUE
1    10071_s_at 3473.60000        P/present       0.000219000
2       1053_at  643.20000        P/present       0.000673000
3        117_at  564.00000        M/Marginal      0.000322000
4     1255_g_at    9.40000        A/absent        0.602006000
5       1294_at  845.60000        P/present       0.000468000
6       1320_at   94.30000        A/absent        0.204022000")

dat2 <- read.table(header = TRUE, check.names = FALSE,
text="VALUE      ID_REF     ABS_CALL           'DETECTION P-VALUE'
1  3473.6   10071_s_at  P/present          0.000219
2  643.2    1053_at     P/present          0.000673
3  564      117_at      M/marginal         0.000322
4  9.4      1255_g_at   A/absent           0.602006
5  845.6    1294_at     P/present          0.000468
6  94.3     1320_at     A/absent           0.204022")

dat3 <- read.table(header = TRUE, check.names = FALSE,
text="   'DETECTION P-VALUE'    VALUE   ID_REF     ABS_CALL
1  0.000219             3473.6  10071_s_at  P
2  0.000673             643.2   1053_at     P
3  0.000322             564     117_at      M
4  0.602006             9.4     1255_g_at   A
5  0.000468             845.6   1294_at     P
6  0.204022             94.3    1320_at     A")


f <- function(data) {
  d1 <- data[1, , drop = FALSE]
  ## from row 1, separate out numerics, p-value is < 1
  ## and the other we assume is value
  nums <- d1[, nn <- sapply(d1, is.numeric)]
  p <- names(nums[, nums < 1, drop = FALSE])
  val <- setdiff(names(nums), p)

  ## take all the other columns, find `_at$` as id
  ## and assume the other column is `abs_call`
  ch <- d1[, !nn, drop = FALSE]
  id <- names(ch[, grepl('_at$', as.character(unlist(ch))), drop = FALSE])
  abs <- setdiff(names(ch), id)

  ## order by the name found
  data[, c(id, val, abs, p)]
}

lapply(list(dat1, dat2, dat2), f)

# [[1]]
#       ID_REF  VALUE   ABS_CALL DETECTION.P.VALUE
# 1 10071_s_at 3473.6  P/present          0.000219
# 2    1053_at  643.2  P/present          0.000673
# 3     117_at  564.0 M/Marginal          0.000322
# 4  1255_g_at    9.4   A/absent          0.602006
# 5    1294_at  845.6  P/present          0.000468
# 6    1320_at   94.3   A/absent          0.204022
# 
# [[2]]
#       ID_REF  VALUE   ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6  P/present          0.000219
# 2    1053_at  643.2  P/present          0.000673
# 3     117_at  564.0 M/marginal          0.000322
# 4  1255_g_at    9.4   A/absent          0.602006
# 5    1294_at  845.6  P/present          0.000468
# 6    1320_at   94.3   A/absent          0.204022
# 
# [[3]]
#       ID_REF  VALUE   ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6  P/present          0.000219
# 2    1053_at  643.2  P/present          0.000673
# 3     117_at  564.0 M/marginal          0.000322
# 4  1255_g_at    9.4   A/absent          0.602006
# 5    1294_at  845.6  P/present          0.000468
# 6    1320_at   94.3   A/absent          0.204022