所以我的数据框如下:
gene sample value score
1 2310043M15Rik 1a_S1 0.035023917 0.8192723
2 2310043M15Rik 2a_S2 0.030513262 0.8192723
3 2310043M15Rik 3a_S3 0.043984305 0.8192723
4 2310043M15Rik 1b_S1 0.000000000 0.8192723
5 2310043M15Rik 2b_S2 0.000000000 0.8192723
6 2310043M15Rik 3b_S3 0.000000000 0.8192723
7 2310043M15Rik 4_S4 0.541528427 0.8192723
8 2310043M15Rik 5_S5 0.601787500 0.8192723
9 2310043M15Rik 6_S6 0.672417814 0.8192723
10 2310043M15Rik 10_S10 1.791885603 0.8192723
11 2310043M15Rik 11_S11 2.001114749 0.8192723
12 2310043M15Rik 12_S12 1.700699778 0.8192723
13 2310043M15Rik 16_S16 3.279904599 0.8192723
14 2310043M15Rik 17_S17 3.389471358 0.8192723
15 2310043M15Rik 18_S18 3.417522968 0.8192723
16 2310043M15Rik 22_S22 2.578413695 0.8192723
17 2310043M15Rik 23_S23 1.977315641 0.8192723
18 2310043M15Rik 24_S24 1.951025717 0.8192723
19 2310043M15Rik 28_S28 3.344688860 0.8192723
20 2310043M15Rik 29_S29 2.768640841 0.8192723
21 2310043M15Rik 30_S30 2.737122410 0.8192723
22 2310043M15Rik 34_S34 3.851056653 0.8192723
23 2310043M15Rik 35_S35 3.532010607 0.8192723
24 2310043M15Rik 36_S36 3.590795543 0.8192723
25 5730508B09Rik 1a_S1 1.146767967 0.8029265
26 5730508B09Rik 2a_S2 0.678569811 0.8029265
27 5730508B09Rik 3a_S3 0.756856431 0.8029265
28 5730508B09Rik 1b_S1 1.131529434 0.8029265
29 5730508B09Rik 2b_S2 0.824058995 0.8029265
30 5730508B09Rik 3b_S3 0.780254355 0.8029265
31 5730508B09Rik 4_S4 1.014725971 0.8029265
32 5730508B09Rik 5_S5 1.152045200 0.8029265
33 5730508B09Rik 6_S6 0.969898879 0.8029265
我希望按每个基因的分数对数据帧进行排序。我尝试过以下方法:
c1m.tcps_up$gene <- factor(c1m.tcps_up$gene,
levels = c1m.tcps_up$gene [order(c1m.tcps_up$score)])
其中c1m.tcps_up$gene
是数据框,但错误仍然按如下方式返回:
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
因子水平[2]重复
我还是R的新手,我想弄清楚到底发生了什么,以及我哪里出错了。
谢谢!
答案 0 :(得分:1)
您可以使用dplyr
library(dplyr)
c1m.tcps_up %>%
group_by(gene) %>%
arrange(score)
## A tibble: 33 x 4
## Groups: gene [2]
# gene sample value score
# <fct> <fct> <dbl> <dbl>
# 1 5730508B09Rik 1a_S1 1.15 0.803
# 2 5730508B09Rik 2a_S2 0.679 0.803
# 3 5730508B09Rik 3a_S3 0.757 0.803
# 4 5730508B09Rik 1b_S1 1.13 0.803
# 5 5730508B09Rik 2b_S2 0.824 0.803
# 6 5730508B09Rik 3b_S3 0.780 0.803
# 7 5730508B09Rik 4_S4 1.01 0.803
# 8 5730508B09Rik 5_S5 1.15 0.803
# 9 5730508B09Rik 6_S6 0.970 0.803
#10 2310043M15Rik 1a_S1 0.0350 0.819
## ... with 23 more rows
如果您想降序,可以将desc
添加到arrange
:
c1m.tcps_up %>%
group_by(gene) %>%
arrange(score)
## A tibble: 33 x 4
## Groups: gene [2]
# gene sample value score
# <fct> <fct> <dbl> <dbl>
# 1 2310043M15Rik 1a_S1 0.0350 0.819
# 2 2310043M15Rik 2a_S2 0.0305 0.819
# 3 2310043M15Rik 3a_S3 0.0440 0.819
# 4 2310043M15Rik 1b_S1 0 0.819
# 5 2310043M15Rik 2b_S2 0 0.819
# 6 2310043M15Rik 3b_S3 0 0.819
# 7 2310043M15Rik 4_S4 0.542 0.819
# 8 2310043M15Rik 5_S5 0.602 0.819
# 9 2310043M15Rik 6_S6 0.672 0.819
#10 2310043M15Rik 10_S10 1.79 0.819
## ... with 23 more rows
答案 1 :(得分:0)
如果您只需要订购数据框,order
或dplyr::arrange
可以提供帮助:
c1m.tcps_up[order(c1m.tcps_up$score), ]
或
c1m.tcps_up %>% dplyr::arrange(score)
如果每个基因可以有多个分数,请说明您需要如何订购数据。
答案 2 :(得分:0)
<强>解决方案:强>
首先按值排序:
c1m.tcps_up <- c1m.tcps_up [order(c1m.tcps_up$score),]
然后在必要时创建一个因子:
c1m.tcps_up$gene <- factor(c1m.tcps_up$gene)
关于错误本身:
您的因子水平必须是唯一的,因此可以将有序的基因行包装到unique()
中,但R只是通过运行上面给出的分解命令为您完成。