我在R中有一个数据帧,这是它的一部分:
<div class="cycle-words" data-words="yes,no,maybe"></div>
<div class="cycle-words" data-words="hello,hi,hey"></div>
我想要实现的是按照第二列中最常见的元素(出现次数更多)对数据帧进行排序,这是理想的结果:
Kif21a PTHR24115 ENSMUSG00000022629
Acss3 PTHR24115 ENSMUSG00000035948
Nr1h4 PTHR24082 ENSMUSG00000047638
Rarg PTHR24082 ENSMUSG00000001288
Vdr PTHR24082 ENSMUSG00000022479
Pamr1 PTHR24254 ENSMUSG00000027188
非常感谢!
答案 0 :(得分:1)
一个选项是
library(dplyr)
df1 %>%
group_by(col2) %>%
mutate(n = n()) %>%
ungroup %>%
arrange(desc(n))
另一个选项是add_count
df1 %>%
add_count(col2) %>%
arrange(desc(n))
# A tibble: 6 x 4
# col1 col2 col3 n
# <chr> <chr> <chr> <int>
#1 Nr1h4 PTHR24082 ENSMUSG00000047638 3
#2 Rarg PTHR24082 ENSMUSG00000001288 3
#3 Vdr PTHR24082 ENSMUSG00000022479 3
#4 Kif21a PTHR24115 ENSMUSG00000022629 2
#5 Acss3 PTHR24115 ENSMUSG00000035948 2
#6 Pamr1 PTHR24254 ENSMUSG00000027188 1
或将base R
与ave
一起使用
df1[with(df1, order(-ave(seq_along(col2), col2, FUN = length))),]
df1 <- structure(list(col1 = c("Kif21a", "Acss3", "Nr1h4", "Rarg", "Vdr",
"Pamr1"), col2 = c("PTHR24115", "PTHR24115", "PTHR24082", "PTHR24082",
"PTHR24082", "PTHR24254"), col3 = c("ENSMUSG00000022629", "ENSMUSG00000035948",
"ENSMUSG00000047638", "ENSMUSG00000001288", "ENSMUSG00000022479",
"ENSMUSG00000027188")), class = "data.frame", row.names = c(NA,
-6L))
答案 1 :(得分:1)
如果您的列名为A,B,C,则可以使用以下代码。这会将N
列添加到df
中,因此,如果您不希望这样做,可以在开始时添加df <-
以使此输出覆盖df
,或替换{ {1}}与df
copy(df)
答案 2 :(得分:0)
使用基础:
df <-as.data.frame(matrix(c("Kif21a", "PTHR24115", "ENSMUSG00000022629",
"Acss3", "PTHR24115", "ENSMUSG00000035948",
"Nr1h4", "PTHR24082", "ENSMUSG00000047638",
"Rarg", "PTHR24082", "ENSMUSG00000001288",
"Vdr", "PTHR24082", "ENSMUSG00000022479",
"Pamr1", "PTHR24254", "ENSMUSG00000027188"),ncol =3, byrow = T))
V1 V2 V3
1 Kif21a PTHR24115 ENSMUSG00000022629
2 Acss3 PTHR24115 ENSMUSG00000035948
3 Nr1h4 PTHR24082 ENSMUSG00000047638
4 Rarg PTHR24082 ENSMUSG00000001288
5 Vdr PTHR24082 ENSMUSG00000022479
6 Pamr1 PTHR24254 ENSMUSG00000027188
tmp <- table(df$V2)
df[order(tmp[levels(df$V2)[df$V2]], decreasing = T),]
V1 V2 V3
3 Nr1h4 PTHR24082 ENSMUSG00000047638
4 Rarg PTHR24082 ENSMUSG00000001288
5 Vdr PTHR24082 ENSMUSG00000022479
1 Kif21a PTHR24115 ENSMUSG00000022629
2 Acss3 PTHR24115 ENSMUSG00000035948
6 Pamr1 PTHR24254 ENSMUSG00000027188
答案 3 :(得分:0)
R的基本方法是使用V2
计算table
的出现次数,sort
以降序对其进行计数,使用stack
和{{1}将其转换为数据帧}和原始数据框
merge
如果不需要,您可以删除merge(df, stack(sort(table(df$V2), decreasing = TRUE)), by.x = "V2", by.y = "ind")
# V2 V1 V3 values
#1 PTHR24082 Nr1h4 ENSMUSG00000047638 3
#2 PTHR24082 Rarg ENSMUSG00000001288 3
#3 PTHR24082 Vdr ENSMUSG00000022479 3
#4 PTHR24115 Kif21a ENSMUSG00000022629 2
#5 PTHR24115 Acss3 ENSMUSG00000035948 2
#6 PTHR24254 Pamr1 ENSMUSG00000027188 1
列,该列是每个values
的频率计数。
在V2
中,我们可以使用dplyr
inner_join