根据因子中的数字对数据帧中的数据进行排序

时间:2018-05-08 14:23:20

标签: r

我有几个数据框,我绑定到包含两个变量的final:“Label”和“Mean”。

标签是这种格式:

>                                               Label       Mean
>1       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10) 18.97021 
>2       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11) 16.40476
>3       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12) 24.79132
>4       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (13) 20.95391
>5       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (14) 19.67626
>6       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (15) 28.93776

我想根据Label中的数字组织数据,如下所示:

>                                              Label       Mean
>1       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (1) 18.97021
>2       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (2) 16.40476
>3       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (3) 24.79132
>4       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (4) 20.95391
>5       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (5) 19.67626
>6       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (6) 28.93776

有什么建议可以完成这样的事情吗? 谢谢

4 个答案:

答案 0 :(得分:3)

使用mixedorder中的gtools

df[gtools::mixedorder(df$Label),]

答案 1 :(得分:1)

这里有一个提取内部数字的解决方案"()"使用strsplit:

示例输入数据:

df<-data.frame(Label=c("C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12)",
                        "C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11)",
                        "C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10)"),
                Mean=c(1,2,3))

df
                                           Label Mean
1 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12)    1
2 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11)    2
3 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10)    3

排序:

df[order(as.numeric(unlist(strsplit(unlist(lapply(strsplit(as.character(df$Label),split="(",fixed=T),"[",2)),split=")")))),]
                                           Label Mean
3 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10)    3
2 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11)    2
1 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12)    1

答案 2 :(得分:1)

我首先创建一个新变量,在第一个括号后面包含所有数字,不包括它。然后我订购数据框

library(stringr)

df$label_id = as.numeric(str_exctract(df$label, '(?<=\\()\\d+'))
df = df[order(label_id),]

答案 3 :(得分:0)

这是dplyrLabel和变异Label

排序的方法
library(magrittr)
ans <- df %>%
        dplyr::arrange(as.numeric(gsub(".*\\((\\d+)\\)$", "\\1", Label))) %>%
        dplyr::mutate(Label = paste0(gsub("(.*)\\(\\d+\\)$", "\\1", Label), "(", row_number(), ")"))

                                          # Label     Mean
# 1 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (1) 18.97021
# 2 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (2) 16.40476
# 3 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (3) 24.79132
# 4 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (4) 20.95391
# 5 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (5) 19.67626
# 6 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (6) 28.93776

数据

df <- read.table(text="Label,Mean
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10),18.97021 
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11),16.40476
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12),24.79132
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (13),20.95391
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (14),19.67626
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (15),28.93776", header=TRUE, sep=",", stringsAsFactors=FALSE)