我正在尝试制作群集热图,如Cluster data in heat map in R ggplot所述,并且遇到了一个令人困惑的错误。
我可以按如下方式制作非聚集距离热图:
library(vegan)
library(tidyverse)
data(varespec)
library(reshape2)
library(viridis)
# Calculate a distance matrix
vare.dist <- vegdist(varespec)
# Cluster the distance matrix.
vare.hc <- hclust(as.dist(vare.dist))
# Process and melt the distance matrix
vare.dist.long <- vare.dist %>% as.matrix %>% melt %>%
mutate(Var1 = as.character(Var1), Var2 = as.character(Var2))
# Plot the heatmap
vare.dist.long %>% #as.matrix %>% .[vare.hc$order, vare.hc$order] %>% melt %>%
ggplot(aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_viridis(direction = 1) +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5
))
要对热图进行聚类,我需要将vare.dist.long$Var1
和vare.dist.long$Var2
转换为正确排序的因子。我认为我可以这样做
# Step 1: works without complaint
vare.dist.long1 <- vare.dist.long %>% mutate(Var1 = factor(Var1, levels = Var1[vare.hc$order]))
# Step 2: throws error
vare.dist.long2 <- vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order]))
然后在绘图功能中将vare.dist.long
替换为vare.dist.long3
。
奇怪的是,虽然排序Var1
(如同#Step 1
行)似乎没有抱怨,但当我尝试对Var2
执行完全相同的操作时(如#Step 2
1}} line)我收到以下错误:
Error in mutate_impl(.data, dots): Evaluation error: factor level [2] is duplicated. Traceback: 1. vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order])) 2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 3. eval(quote(`_fseq`(`_lhs`)), env, env) 4. eval(quote(`_fseq`(`_lhs`)), env, env) 5. `_fseq`(`_lhs`) 6. freduce(value, `_function_list`) 7. withVisible(function_list[[k]](value)) 8. function_list[[k]](value) 9. mutate(., Var2 = factor(Var2, levels = Var2[vare.hc$order])) 10. mutate.data.frame(., Var2 = factor(Var2, levels = Var2[vare.hc$order])) 11. as.data.frame(mutate(tbl_df(.data), ...)) 12. mutate(tbl_df(.data), ...) 13. mutate.tbl_df(tbl_df(.data), ...) 14. mutate_impl(.data, dots)
我在这里缺少什么?为什么我不能改变Var2
,据我所知,它与Var1
几乎相同,但顺序不同?
答案 0 :(得分:1)
提供给levels
参数的向量不应该有任何重复。如果您在控制台中输入以下内容,则会看到您为Var2
中的所有数字提供了相同级别。
vare.dist.long$Var2[vare.hc$order]
# [1] "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18"
# [19] "18" "18" "18" "18" "18" "18"
我认为以下内容可行。 unique(Var1)
和unique(Var2)
是为了确保没有重复项。
vare.dist.long1 <- vare.dist.long %>% mutate(Var1 = factor(Var1, levels = unique(Var1)[vare.hc$order]))
vare.dist.long2 <- vare.dist.long %>% mutate(Var2 = factor(Var2, levels = unique(Var2)[vare.hc$order]))