我之所以问这个问题,是因为即使这个网站上有很多类似的问题(例如this,this和this),但这些都不是我的真实情况。实际上,这个link提出的问题与我的相同,但是我的答案还不清楚,并提出了我要提出的问题。
我有一个数据集,我将从中构造一个堆积的条形图,而我不知道如何在“相似”的人聚集在一起的地方布置堆积的条形图。我从事生物信息学研究,这是d×n矩阵的数据集。在这个玩具数据集中,有d = 10个祖先种群,n = 5个个体:
> a
V1 V2 V3 V4 V5
1 0.534410243 0.009358740 0.011295181 0.2141751740 0.0030129254
2 0.026653603 0.372426720 0.447847534 0.0179177507 0.4072904477
3 0.193317915 0.003605024 0.003186611 0.4832114736 0.0007095471
4 0.111881585 0.000000000 0.000000000 0.2296213741 0.0119233461
5 0.089696570 0.591163629 0.509774416 0.0032542030 0.5535847030
6 0.007543558 0.000000000 0.000000000 0.0364907757 0.0013148362
7 0.004862942 0.000000000 0.002123909 0.0146682272 0.0004053690
8 0.009276195 0.011710457 0.014367894 0.0000000000 0.0000000000
9 0.006903171 0.004314528 0.011404455 0.0000000000 0.0126889937
10 0.015454219 0.007420903 0.000000000 0.0006610215 0.0090698319
所有列的总和为1。我创建了一个堆叠的barplot,如下所示:
pop <- rownames(a)
a <- a %>% mutate(pop = rownames(a))
a_long <- gather(a, key, value, -pop)
# trying to create a similarity index
a_long <- a_long %>% group_by(key) %>%
mutate(mean = mean(value)) %>%
arrange(desc(mean))
# looking at some of the expanded dataset
> a_long[1:20,]
# A tibble: 20 x 4
# Groups: key [2]
pop key value mean
<chr> <chr> <dbl> <dbl>
1 1 V2 0.00936 0.1
2 2 V2 0.372 0.1
3 3 V2 0.00361 0.1
4 4 V2 0 0.1
5 5 V2 0.591 0.1
6 6 V2 0 0.1
7 7 V2 0 0.1
8 8 V2 0.0117 0.1
9 9 V2 0.00431 0.1
10 10 V2 0.00742 0.1
11 1 V4 0.214 0.1
12 2 V4 0.0179 0.1
13 3 V4 0.483 0.1
14 4 V4 0.230 0.1
15 5 V4 0.00325 0.1
16 6 V4 0.0365 0.1
17 7 V4 0.0147 0.1
18 8 V4 0 0.1
19 9 V4 0 0.1
20 10 V4 0.000661 0.1
# colors
v_colors <- c("#440154FF", "#443B84FF", "#34618DFF", "#404588FF", "#1FA088FF", "#40BC72FF",
"#67CC5CFF", "#A9DB33FF", "#DDE318FF", "#FDE725FF")
plot <- ggplot(a_long, aes(x = key, y = value, fill = pop))
plot + geom_bar(position="stack", stat="identity") + scale_fill_manual(values = v_colors)
如何使输出看起来更整洁,例如在X轴上,人口5血统比例较高的个体是否彼此相邻?到目前为止,我已经尝试计算每个人的价值“均值”,但是由于它不是一个很好的衡量标准,所以它没有用。如何创建一个相似性指数,告诉我个体1与个体2的相似程度,然后如何在x轴上对它们进行排序,以使它们看起来很好(例如,像this figure中的条形图) )?
谢谢!
最后一件事:如果要重新创建数据集a
,请使用以下代码:
v1 = c(0.534410243, 0.026653603, 0.193317915, 0.111881585, 0.089696570, 0.007543558, 0.004862942, 0.009276195, 0.006903171, 0.015454219)
v2 = c(0.009358740, 0.372426720, 0.003605024, 0.000000000, 0.591163629, 0.000000000, 0.000000000, 0.011710457, 0.004314528, 0.007420903)
v3 = c(0.011295181, 0.447847534, 0.003186611, 0.000000000, 0.509774416, 0.000000000, 0.002123909, 0.014367894, 0.011404455, 0.000000000)
v4 = c(0.2141751740, 0.0179177507, 0.4832114736, 0.2296213741, 0.0032542030, 0.0364907757, 0.0146682272, 0.0000000000, 0.0000000000, 0.0006610215)
v5 = c(0.0030129254, 0.4072904477, 0.0007095471, 0.0119233461, 0.5535847030, 0.0013148362, 0.0004053690, 0.0000000000, 0.0126889937, 0.0090698319)
a = data.frame(V1 = v1, V2 = v2, V3 = v3, V4 = v4, V5 = v5)