可能重复: How to sort a dataframe by column(s)?
我有一个不同顺序的数据框(就列而言)。
ID_REF VALUE ABS_CALL DETECTION.P.VALUE
1 10071_s_at 3473.60000 P/present 0.000219000
2 1053_at 643.20000 P/present 0.000673000
3 117_at 564.00000 M/Marginal 0.000322000
4 1255_g_at 9.40000 A/absent 0.602006000
5 1294_at 845.60000 P/present 0.000468000
6 1320_at 94.30000 A/absent 0.204022000
现在低于列的顺序
VALUE ID_REF ABS_CALL DETECTION P-VALUE
1 3473.6 10071_s_at P/present 0.000219
2 643.2 1053_at P/present 0.000673
3 564 117_at M/marginal 0.000322
4 9.4 1255_g_at A/absent 0.602006
5 845.6 1294_at P/present 0.000468
6 94.3 1320_at A/absent 0.204022
再次改变
DETECTION P-VALUE VALUE ID_REF ABS_CALL
1 0.000219 3473.6 10071_s_at P
2 0.000673 643.2 1053_at P
3 0.000322 564 117_at M
4 0.602006 9.4 1255_g_at A
5 0.000468 845.6 1294_at P
6 0.204022 94.3 1320_at A
这里我在不同的列顺序中有相同的数据框。我不知道数据框的顺序,所以我想把数据放在下面的格式中:
ID_REF VALUE ABS_CALL DETECTION.P.VALUE
1 10071_s_at 3473.60000 P/present 0.000219000
2 1053_at 643.20000 P/present 0.000673000
3 117_at 564.00000 M/Marginal 0.000322000
4 1255_g_at 9.40000 A/absent 0.602006000
5 1294_at 845.60000 P/present 0.000468000
6 1320_at 94.30000 A/absent 0.204022000
在这里,我需要检查任何列中是否存在_at
的子字符串,然后将其作为第一列。如果任何列的值大于1,则将该列作为数据框的第二列。如果任何列的级别为P
,A
,M
或present
,absent
,marginal
,则将其作为第三列,最后为任意列值小于1的列将是最后一列。任何人都可以告诉我如何有效地在R中这样做?
注意:列名不是永久性的,可以是任何名称(不同的名称)。
答案 0 :(得分:2)
这无论如何都不优雅,但您可以编写一个只查看单行并搜索每个条件的函数:
dat1 <- read.table(header = TRUE, check.names = FALSE,
text="ID_REF VALUE ABS_CALL DETECTION.P.VALUE
1 10071_s_at 3473.60000 P/present 0.000219000
2 1053_at 643.20000 P/present 0.000673000
3 117_at 564.00000 M/Marginal 0.000322000
4 1255_g_at 9.40000 A/absent 0.602006000
5 1294_at 845.60000 P/present 0.000468000
6 1320_at 94.30000 A/absent 0.204022000")
dat2 <- read.table(header = TRUE, check.names = FALSE,
text="VALUE ID_REF ABS_CALL 'DETECTION P-VALUE'
1 3473.6 10071_s_at P/present 0.000219
2 643.2 1053_at P/present 0.000673
3 564 117_at M/marginal 0.000322
4 9.4 1255_g_at A/absent 0.602006
5 845.6 1294_at P/present 0.000468
6 94.3 1320_at A/absent 0.204022")
dat3 <- read.table(header = TRUE, check.names = FALSE,
text=" 'DETECTION P-VALUE' VALUE ID_REF ABS_CALL
1 0.000219 3473.6 10071_s_at P
2 0.000673 643.2 1053_at P
3 0.000322 564 117_at M
4 0.602006 9.4 1255_g_at A
5 0.000468 845.6 1294_at P
6 0.204022 94.3 1320_at A")
f <- function(data) {
d1 <- data[1, , drop = FALSE]
## from row 1, separate out numerics, p-value is < 1
## and the other we assume is value
nums <- d1[, nn <- sapply(d1, is.numeric)]
p <- names(nums[, nums < 1, drop = FALSE])
val <- setdiff(names(nums), p)
## take all the other columns, find `_at$` as id
## and assume the other column is `abs_call`
ch <- d1[, !nn, drop = FALSE]
id <- names(ch[, grepl('_at$', as.character(unlist(ch))), drop = FALSE])
abs <- setdiff(names(ch), id)
## order by the name found
data[, c(id, val, abs, p)]
}
lapply(list(dat1, dat2, dat2), f)
# [[1]]
# ID_REF VALUE ABS_CALL DETECTION.P.VALUE
# 1 10071_s_at 3473.6 P/present 0.000219
# 2 1053_at 643.2 P/present 0.000673
# 3 117_at 564.0 M/Marginal 0.000322
# 4 1255_g_at 9.4 A/absent 0.602006
# 5 1294_at 845.6 P/present 0.000468
# 6 1320_at 94.3 A/absent 0.204022
#
# [[2]]
# ID_REF VALUE ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6 P/present 0.000219
# 2 1053_at 643.2 P/present 0.000673
# 3 117_at 564.0 M/marginal 0.000322
# 4 1255_g_at 9.4 A/absent 0.602006
# 5 1294_at 845.6 P/present 0.000468
# 6 1320_at 94.3 A/absent 0.204022
#
# [[3]]
# ID_REF VALUE ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6 P/present 0.000219
# 2 1053_at 643.2 P/present 0.000673
# 3 117_at 564.0 M/marginal 0.000322
# 4 1255_g_at 9.4 A/absent 0.602006
# 5 1294_at 845.6 P/present 0.000468
# 6 1320_at 94.3 A/absent 0.204022