订购数据帧

时间:2018-05-24 10:25:35

标签: r sorting

所以我的数据框如下:

              gene sample       value     score
1    2310043M15Rik  1a_S1 0.035023917 0.8192723
2    2310043M15Rik  2a_S2 0.030513262 0.8192723
3    2310043M15Rik  3a_S3 0.043984305 0.8192723
4    2310043M15Rik  1b_S1 0.000000000 0.8192723
5    2310043M15Rik  2b_S2 0.000000000 0.8192723
6    2310043M15Rik  3b_S3 0.000000000 0.8192723
7    2310043M15Rik   4_S4 0.541528427 0.8192723
8    2310043M15Rik   5_S5 0.601787500 0.8192723
9    2310043M15Rik   6_S6 0.672417814 0.8192723
10   2310043M15Rik 10_S10 1.791885603 0.8192723
11   2310043M15Rik 11_S11 2.001114749 0.8192723
12   2310043M15Rik 12_S12 1.700699778 0.8192723
13   2310043M15Rik 16_S16 3.279904599 0.8192723
14   2310043M15Rik 17_S17 3.389471358 0.8192723
15   2310043M15Rik 18_S18 3.417522968 0.8192723
16   2310043M15Rik 22_S22 2.578413695 0.8192723
17   2310043M15Rik 23_S23 1.977315641 0.8192723
18   2310043M15Rik 24_S24 1.951025717 0.8192723
19   2310043M15Rik 28_S28 3.344688860 0.8192723
20   2310043M15Rik 29_S29 2.768640841 0.8192723
21   2310043M15Rik 30_S30 2.737122410 0.8192723
22   2310043M15Rik 34_S34 3.851056653 0.8192723
23   2310043M15Rik 35_S35 3.532010607 0.8192723
24   2310043M15Rik 36_S36 3.590795543 0.8192723
25   5730508B09Rik  1a_S1 1.146767967 0.8029265
26   5730508B09Rik  2a_S2 0.678569811 0.8029265
27   5730508B09Rik  3a_S3 0.756856431 0.8029265
28   5730508B09Rik  1b_S1 1.131529434 0.8029265
29   5730508B09Rik  2b_S2 0.824058995 0.8029265
30   5730508B09Rik  3b_S3 0.780254355 0.8029265
31   5730508B09Rik   4_S4 1.014725971 0.8029265
32   5730508B09Rik   5_S5 1.152045200 0.8029265
33   5730508B09Rik   6_S6 0.969898879 0.8029265

我希望按每个基因的分数对数据帧进行排序。我尝试过以下方法:

c1m.tcps_up$gene <- factor(c1m.tcps_up$gene, 
                       levels = c1m.tcps_up$gene [order(c1m.tcps_up$score)])

其中c1m.tcps_up$gene是数据框,但错误仍然按如下方式返回:

Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : 

因子水平[2]重复

我还是R的新手,我想弄清楚到底发生了什么,以及我哪里出错了。

谢谢!

3 个答案:

答案 0 :(得分:1)

您可以使用dplyr

library(dplyr)

c1m.tcps_up %>% 
  group_by(gene) %>% 
  arrange(score)

## A tibble: 33 x 4
## Groups:   gene [2]
#   gene          sample  value score
#   <fct>         <fct>   <dbl> <dbl>
# 1 5730508B09Rik 1a_S1  1.15   0.803
# 2 5730508B09Rik 2a_S2  0.679  0.803
# 3 5730508B09Rik 3a_S3  0.757  0.803
# 4 5730508B09Rik 1b_S1  1.13   0.803
# 5 5730508B09Rik 2b_S2  0.824  0.803
# 6 5730508B09Rik 3b_S3  0.780  0.803
# 7 5730508B09Rik 4_S4   1.01   0.803
# 8 5730508B09Rik 5_S5   1.15   0.803
# 9 5730508B09Rik 6_S6   0.970  0.803
#10 2310043M15Rik 1a_S1  0.0350 0.819
## ... with 23 more rows

如果您想降序,可以将desc添加到arrange

c1m.tcps_up %>% 
      group_by(gene) %>% 
      arrange(score)

## A tibble: 33 x 4
## Groups:   gene [2]
#   gene          sample  value score
#   <fct>         <fct>   <dbl> <dbl>
# 1 2310043M15Rik 1a_S1  0.0350 0.819
# 2 2310043M15Rik 2a_S2  0.0305 0.819
# 3 2310043M15Rik 3a_S3  0.0440 0.819
# 4 2310043M15Rik 1b_S1  0      0.819
# 5 2310043M15Rik 2b_S2  0      0.819
# 6 2310043M15Rik 3b_S3  0      0.819
# 7 2310043M15Rik 4_S4   0.542  0.819
# 8 2310043M15Rik 5_S5   0.602  0.819
# 9 2310043M15Rik 6_S6   0.672  0.819
#10 2310043M15Rik 10_S10 1.79   0.819
## ... with 23 more rows

答案 1 :(得分:0)

如果您只需要订购数据框,orderdplyr::arrange可以提供帮助:

c1m.tcps_up[order(c1m.tcps_up$score), ]

c1m.tcps_up %>% dplyr::arrange(score)

如果每个基因可以有多个分数,请说明您需要如何订购数据。

答案 2 :(得分:0)

<强>解决方案:

首先按值排序:

c1m.tcps_up  <- c1m.tcps_up  [order(c1m.tcps_up$score),]

然后在必要时创建一个因子:

c1m.tcps_up$gene  <- factor(c1m.tcps_up$gene)

关于错误本身:

您的因子水平必须是唯一的,因此可以将有序的基因行包装到unique()中,但R只是通过运行上面给出的分解命令为您完成。