我有一个与此数据集相似的数据集(1000个ID,9个类):
ID Class Value
1 A 0.014
1 B 0.665
1 C 0.321
2 A 0.234
2 B 0.424
2 C 0.342
... ... ...
Value
列是(相对)丰度,即,一个人的所有类别的总和等于1。
我想在R中创建一个ggplot geom_bar
图,其中x轴不是按ID排序,而是通过减少类的丰度来进行排序,类似于此图:
在我们的示例中,假设Class B
是所有个体中最丰富的类别,其次是Class C
,最后是Class A
,x轴的第一个竖条将用于Class B
最高的个人,第二条Class B
最高的个人,依此类推。
这是我尝试过的:
ggplot(df, aes(x=ID, y=Value, fill=Class)) +
geom_bar(stat="identity") +
xlab("") +
ylab("Relative Abundance\n")
答案 0 :(得分:1)
您可以在将结果传递到ggplot()
之前进行重新排序:
library(dplyr)
library(ggplot2)
# sum the abundance for each class, across all IDs, & sort the result
sort.class <- df %>%
count(Class, wt = Value) %>%
arrange(desc(n)) %>%
pull(Class)
# get ID order, sorted by each ID's abundance in the most abundant class
ID.order <- df %>%
filter(Class == sort.class[1]) %>%
arrange(desc(Value)) %>%
pull(ID)
# factor ID / Class in the desired order
df %>%
mutate(ID = factor(ID, levels = ID.order)) %>%
mutate(Class = factor(Class, levels = rev(sort.class))) %>%
ggplot(aes(x = ID, y = Value, fill = Class)) +
geom_col(width = 1) #geom_col is equivalent to geom_bar(stat = "identity")
样本数据:
library(tidyr)
set.seed(1234)
df <- data.frame(
ID = seq(1, 100),
A = sample(seq(2, 3), 100, replace = TRUE),
B = sample(seq(5, 9), 100, replace = TRUE),
C = sample(seq(3, 7), 100, replace = TRUE),
D = sample(seq(1, 2), 100, replace = TRUE)
) %>%
gather(Class, Value, -ID) %>%
group_by(ID) %>%
mutate(Value = Value / sum(Value)) %>%
ungroup() %>%
arrange(ID, Class)
> df
# A tibble: 400 x 3
ID Class Value
<int> <chr> <dbl>
1 1 A 0.143
2 1 B 0.357
3 1 C 0.429
4 1 D 0.0714
5 2 A 0.176
6 2 B 0.412
7 2 C 0.294
8 2 D 0.118
9 3 A 0.2
10 3 B 0.4
# ... with 390 more rows