在辅助轴上将文本值添加到ggplot

时间:2019-04-30 14:05:15

标签: r ggplot2

我必须创建图

以下是我的示例数据框

  

data = data.frame(“ Tissue” = c(“ Adrenal gland”,“ Appendix”,“ Appendix”),   “ protein.expression” = c(“ No detect”,“ No detect”,“ Medium”),   “ cell.type” = c(“腺细胞”,“ Lymphoid tissu”,“腺   单元格”)

左y轴是唯一的组织类型。左轴具有逗号分隔的单元格类型。

我不确定如何将每个组织对应的细胞类型(在y轴上)转换为在y轴上(以逗号分隔的形式)

我的代码是

    p1<-ggplot(dat %>% filter(facet==1)
           , aes(x = tissue, y = factor(protein.expression,
                                        levels=unique(protein.expression, 
    decreasing = F),
                                        ordered=TRUE), fill = protein.expression, 
    label = cell.type))+
   geom_point(stat='identity', aes(col=protein.expression), size=12)  +
  geom_text(size = 6, fontface = "bold", colour = "white")+
  geom_label()+
 # facet_grid(cell.type ~ ., scales = "free", space = "free") +
  scale_fill_manual(values = myPalette, drop = FALSE) +
  scale_color_manual(values = myPalette, drop = FALSE) +
  theme_classic() +
  labs(title="Protein Atlas") + 
  guides(fill=guide_legend(title="Protein expression"))+
  ylab("Cell types measured per tissue")+
  #ylim(1,4) +
  coord_flip()+
  theme(axis.text.x = element_text(size = 25, vjust = 0.5, hjust = .9),
        axis.text.y = element_text(size = 25),
        legend.position = "none",
        axis.title.x = element_text(size=30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))

单元格类型用点表示。我希望它们在右边,用逗号分隔,并在可能的情况下用相应的蛋白质表达标签标记颜色。

1 个答案:

答案 0 :(得分:3)

所以这是一个小技巧,但它可能对您有用。

  1. 我在图表中引入了第三列,以按照我的原始帖子保存标签。

  2. 我会对您的数据进行预处理,以尝试将第三列中的标签分散在Tissue变量周围,以使它们不会出现在彼此的顶部。

我的预处理程序很丑陋,但是可以。请注意,根据您的评论,我最多只能容纳4个cell.type。

它给了我这张图: enter image description here

我的代码:

data = data.frame("Tissue"=c("Adrenal gland", "Appendix", "Appendix"), "protein.expression" = c("No detect","No detect", "Medium"), "cell.type" = c("Glandular cells" ,"Lymphoid tissu","Glandular cells"))

# Pre-processing section. 
# Step 1: find out the n of cell.types per tissue type
counters <- data %>% group_by(Tissue) %>% summarise(count = n())

# Step 2: Join n back to original data. Transform protein.expression to ordered factor
data <- data %>%
  inner_join(counters, by="Tissue") %>% 
  mutate(protein = factor(protein.expression, levels=unique(protein.expression, decreasing = F), ordered=TRUE),
         positionTissue = as.numeric(Tissue))

results <- data.frame()

# Step 3: Spread the cell.type labels around the position of the Tissue. 4 scenarios catered for.
for(t in unique(data$Tissue)){
  subData <- filter(data, Tissue == t)
  subData$spreader <- as.numeric(subData$Tissue)

  if(length(unique(subData$cell.type)) == 2){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.1,as.numeric(Tissue)+0.1)) %>%
      select(-x)

    results <- rbind(results, subData)
  } else if(length(unique(subData$cell.type)) == 3){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.15,
                              ifelse(as.numeric(x)==3,as.numeric(Tissue)+0.15,as.numeric(Tissue)))) %>%
      select(-x)

    results <- rbind(results, subData)
  } else if(length(unique(subData$cell.type)) == 4){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.2,
                           ifelse(as.numeric(x)==2,as.numeric(Tissue)-0.1,
                                  ifelse(as.numeric(x)==3,as.numeric(Tissue)+0.1,
                                         ifelse(as.numeric(x)==4,as.numeric(Tissue)+0.2,as.numeric(Tissue)))))) %>%
      select(-x)

    results <- rbind(results, subData)
  } else{
    results <- rbind(results, subData)
  }
}

# Plot the data based on the new label position "spreader" variable
ggplot(results, aes(x = positionTissue, y = protein, label=cell.type)) +
  geom_point(stat='identity', aes(col=protein.expression), size=12)  +
  geom_text(aes(y=0.5,label=Tissue), size=8, fontface="bold", angle=90)+
  geom_label(aes(y="zzz", x=spreader, fill=protein), colour="white") +
  theme_classic() +
  scale_x_continuous(limits = c(min(as.numeric(data$Tissue))-0.5,max(as.numeric(data$Tissue))+0.5))+
  scale_y_discrete(breaks=c("Medium","No detect")) +
  labs(title="Protein Atlas") + 
  guides(fill=guide_legend(title="Protein expression"))+
  ylab("Cell types measured per tissue") +
  xlab("") +
  #ylim(1,4) +
  coord_flip()+
  theme(axis.text.x = element_text(size = 25),
        axis.text.y = element_text(colour = NA),
        legend.position = "none",
        axis.title.x = element_text(size=30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))

编辑#2:

通过创建n个位置(其中n是cell.types的数量)来更新以保留标签颜色:

data = data %>% 
  mutate(position = paste("z",cell.type))

然后,您可以使用此新的位置变量来代替我在原始帖子中建议的静态“ zzz”。您的标签将具有正确的颜色,但是如果有很多cell.types,则图表看起来会很奇怪。

  geom_label(aes(y=position, label = cell.type)) +

编辑#1:更新以通过将cell.types分组为每个组织一个标签来避免标签重叠。

创建一个新的标签字段,以将每种组织类型的各个标签连接起来:

data = data %>% 
  group_by(Tissue) %>%
  mutate(label = paste(cell.type, collapse = "; "))

并修改ggplot调用以使用此新字段,而不使用现有的cell.type字段:

  geom_text(aes(y="zzz", label = label), size = 6, fontface = "bold", colour = "white")+

或:

  geom_label(aes(y="zzz", label = label),) +

原始帖子: 您可以将标签绘制在第三个位置(例如“ zzz”),然后使用scale_x_discrete(breaks = c())在轴标签集中隐藏该位置。

ggplot(data, aes(x = Tissue, y = factor(protein.expression,
                                    levels=unique(protein.expression, 
                                                  decreasing = F),
                                    ordered=TRUE), fill = protein.expression, 
             label = cell.type))+
  geom_point(stat='identity', aes(col=protein.expression), size=12)  +
  geom_text(aes(y="zzz"), size = 6, fontface = "bold", colour = "white")+
  geom_label(aes(y="zzz"),) +
  # facet_grid(cell.type ~ ., scales = "free", space = "free") +
  # scale_fill_manual(values = myPalette, drop = FALSE) +
  # scale_color_manual(values = myPalette, drop = FALSE) +
  theme_classic() +
  scale_y_discrete(breaks=c("Medium","No detect"))+
  labs(title="Protein Atlas") + 
  guides(fill=guide_legend(title="Protein expression"))+
  ylab("Cell types measured per tissue") +
  #ylim(1,4) +
  coord_flip()+
  theme(axis.text.x = element_text(size = 25, vjust = 0.5, hjust = .9),
        axis.text.y = element_text(size = 25),
        legend.position = "none",
        axis.title.x = element_text(size=30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))