ggplot2基于分层聚类重新排序热图

时间:2017-08-04 10:42:48

标签: r ggplot2 heatmap hierarchical-clustering

尽管发现了类似的问题我还没有设法让它发挥作用,但我仍在与ggplot2挣扎。我想按列重新排序并根据分层聚类排一个热图。

这是我的实际代码:

# import
library("ggplot2")
library("scales")
library("reshape2")

# data loading
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t')

# clustering with hclust on row and on column
dd.col <- as.dendrogram(hclust(dist(data_frame)))
dd.row <- as.dendrogram(hclust(dist(t(data_frame))))

# ordering based on clustering
col.ord <- order.dendrogram(dd.col)
row.ord <- order.dendrogram(dd.row)


# making a new data frame reordered 
new_df = as.data.frame(data_frame[col.ord, row.ord])
print(new_df)   # when mannualy looking new_df it seems working 

# get the row name
name = as.factor(row.names(new_df))

# reshape
melte_df = melt(cbind(name, new_df))

# the solution is here to reorder the name column factors levels.
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)])

# ggplot2 dark magic
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value),
 colour = "white") + scale_fill_gradient(low = "white",
 high = "steelblue") + theme(text=element_text(size=12),
 axis.text.y=element_text(size=3)))

# save fig
ggsave(file = "test.pdf")

# result is ordered as only by column what I have missed?

如果你能得到答案,我很喜欢R,你会受到欢迎。

1 个答案:

答案 0 :(得分:1)

Without an example dataset to reproduce, I'm not 100% sure that's the reason, but I would guess that your problem relies at this line:

name = as.factor(row.names(new_df))

When you use a factor, the ordering is based on the ordering of the levels of that factor. You can reorder your data frame as much as you want, the order used when plotting will be the order of your levels.

Here's an example:

data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70))
data_frame
       x  y
1  apple 50
2 banana 30
3  peach 70

data_frame$x <- as.factor(data_frame$x) # Make x column a factor

levels(data_frame$x) # This shows the levels of your factor
[1] "apple"  "banana" "peach" 

data_frame <- data_frame[order(data_frame$y),] # Order by value of y
data_frame
   x  y
2 banana 30
1  apple 50
3  peach 70

# Now let's plot it:
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p

This is the result:

example-result

See? It's not ordered by the y value as we wanted. It's ordered by the levels of the factor. Now, if that's indeed where your problem lies, there are solutions here R - Order a factor based on value in one or more other columns.

An applied example of the solution with dplyr :

library(dplyr)
data_frame <- data_frame %>%
       arrange(y) %>%          # sort your dataframe
       mutate(x = factor(x,x)) # reset your factor-column based on that order

data_frame
       x  y
1 banana 30
2  apple 50
3  peach 70

levels(data_frame$x) # Levels of the factor are reordered!
[1] "banana" "apple"  "peach" 

p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p

This is the result now:

enter image description here

I hope this helps, otherwise, you might want to give an example of your original dataset!