Question

我已经搜索过有关如何通过matplotlib或ggplot绘制图形的方法，但是我不知道该如何制作。

this is the graph that I wanna make

摘自Nature 500（7463）：415-421，2013年8月。

所以我想用点画出来，并标出中位数，以显示分布。

万分感谢！

Answer 1

这个问题实际上是关于如何研究文献的。因此，让我们这样做。

Here's the article in PubMed。它也是免费提供的at PubMed Central。在那里，我们找到了XLS格式的补充数据文件。数据最接近我们所需is this XLS file的文件。不幸的是，探索显示它仅包含8种不同的组织类型，而图1包含30种。因此我们无法从数据中复制该图。这在科学界并不罕见。

但是：图形标题将我们指向this article，其中包含一个相似的图形。数据在this XLS file中可用。

我下载了该文件，并在Excel中打开并保存为最新的XLSX格式。现在，假设文件位于“下载”中，我们可以将其读入R：

library(tidyverse)
library(readxl)
tableS2 <- read_excel("~/Downloads/NIHMS471461-supplement-3.xlsx", 
                      sheet = "Table S2")

现在，我们阅读图形标题：

每个点对应一个正常的肿瘤对，垂直位置表示外显子组中体细胞突变的总频率。肿瘤类型以其中位体细胞突变频率排序...

在我们的文件中，这些对对应于name，总频率为n_coding_mutations，体细胞突变频率为coding_mutation_rate。所以我们想：

按tumor_type分组
计算coding_mutation_rate的中位数
在n_coding_mutations中排序tumor_type的值
按tumor_type中位数排序coding_mutation_rate

然后绘制有序总频率与样本的关系图，并按有序肿瘤类型分组。

可能看起来像这样：

tableS2 %>% 
  group_by(tumor_type) %>% 
  mutate(median_n = median(n_coding_mutations)) %>% 
  arrange(tumor_type, coding_mutation_rate) %>% 
  mutate(idx = row_number()) %>% 
  arrange(median_n) %>% 
  ungroup() %>% 
  mutate(tumor_type = factor(tumor_type, 
                             levels = unique(tumor_type))) %>% 
  ggplot(aes(idx, n_coding_mutations)) + 
    geom_point() + 
    facet_grid(~tumor_type,
               switch = "x") + 
    scale_y_log10() + 
    geom_hline(aes(yintercept = median_n), 
               color = "red") + 
    theme_minimal() + 
    theme(strip.text.x = element_text(angle = 90), 
           axis.title.x = element_blank(), 
           axis.text.x = element_blank())

结果：

看起来与原始图片非常接近

如何在Python或R中绘制这种图形？

1 个答案: