订购一个因子来制作聚集的ggplot热图,但得到一个奇怪的错误

时间:2018-01-30 00:34:40

标签: r ggplot2 dplyr cluster-analysis heatmap

我正在尝试制作群集热图,如Cluster data in heat map in R ggplot所述,并且遇到了一个令人困惑的错误。

我可以按如下方式制作非聚集距离热图:

library(vegan)
library(tidyverse)
data(varespec)
library(reshape2)
library(viridis)

# Calculate a distance matrix
vare.dist <- vegdist(varespec)

# Cluster the distance matrix.
vare.hc <- hclust(as.dist(vare.dist))

# Process and melt the distance matrix
vare.dist.long <- vare.dist %>% as.matrix %>% melt %>%
mutate(Var1 = as.character(Var1), Var2 = as.character(Var2))

# Plot the heatmap
vare.dist.long %>% #as.matrix %>% .[vare.hc$order, vare.hc$order] %>% melt %>%
ggplot(aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_viridis(direction = 1) +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5
                                ))

unclustered heatmap

要对热图进行聚类,我需要将vare.dist.long$Var1vare.dist.long$Var2转换为正确排序的因子。我认为我可以这样做

# Step 1: works without complaint
vare.dist.long1 <- vare.dist.long %>% mutate(Var1 = factor(Var1, levels = Var1[vare.hc$order]))
# Step 2: throws error
vare.dist.long2 <- vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order]))

然后在绘图功能中将vare.dist.long替换为vare.dist.long3

奇怪的是,虽然排序Var1(如同#Step 1行)似乎没有抱怨,但当我尝试对Var2执行完全相同的操作时(如#Step 2 1}} line)我收到以下错误:

Error in mutate_impl(.data, dots): Evaluation error: factor level [2] is duplicated.
Traceback:

1. vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order]))
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. mutate(., Var2 = factor(Var2, levels = Var2[vare.hc$order]))
10. mutate.data.frame(., Var2 = factor(Var2, levels = Var2[vare.hc$order]))
11. as.data.frame(mutate(tbl_df(.data), ...))
12. mutate(tbl_df(.data), ...)
13. mutate.tbl_df(tbl_df(.data), ...)
14. mutate_impl(.data, dots)

我在这里缺少什么?为什么我不能改变Var2,据我所知,它与Var1几乎相同,但顺序不同?

1 个答案:

答案 0 :(得分:1)

提供给levels参数的向量不应该有任何重复。如果您在控制台中输入以下内容,则会看到您为Var2中的所有数字提供了相同级别。

vare.dist.long$Var2[vare.hc$order]
# [1] "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18"
# [19] "18" "18" "18" "18" "18" "18"

我认为以下内容可行。 unique(Var1)unique(Var2)是为了确保没有重复项。

vare.dist.long1 <- vare.dist.long %>% mutate(Var1 = factor(Var1, levels = unique(Var1)[vare.hc$order]))

vare.dist.long2 <- vare.dist.long %>% mutate(Var2 = factor(Var2, levels = unique(Var2)[vare.hc$order]))