在函数内部使用

时间:2019-04-16 16:12:23

标签: r ggplot2 facet survival-analysis color-palette

我正在尝试使用ggsurvplot_facet()函数绘制由变量sex生成的多个变量facet的生存曲线。当我将代码应用于单个拟合模型时,它可以正常工作。但是,当我尝试在函数或for循环中使用相同的代码时,它无法绘制应绘制的所有生存曲线,并返回错误。如果以ggsurvplot()相同的方式作为输入允许survfit元素的列表,我将在ggsurvplot_facet()本身中执行此绘制,但是ggsurvplot_facet()一次仅允许单个survfit元素。

我正在使用Mac OS High Sierra的2018年MacBook Pro在RStudio中运行代码。

考虑以下数据集:http://s000.tinyupload.com/index.php?file_id=01704535336107726906

它包含对100个主题和4个不同变量的多次访问的观察结果。其中两个变量(变量1和变量2)可以具有两个不同的值(0或1),另外两个变量(变量3和变量4)可以具有三个不同的值(0、1或2)。

我已经开始使用可以具有两个不同值的值,并且我编写了以下代码:

# Load libraries
require(mgcv)
require(msm)
library(dplyr)
library(grDevices)
library(survival)
library(survminer)


# Set working directory
dirname<-dirname(rstudioapi::getSourceEditorContext()$path)
setwd(dirname)


load("ggsurvplot_facet_error.rda")


fit_test <- survfit(
  Surv(follow_up, as.numeric(status)) ~ (sex + variable1), data = data)

plot_test <- ggsurvplot_facet(fit_test,
                                     data = data,
                                     pval = TRUE,
                                     conf.int = TRUE,
                                     surv.median.line = "hv", # Specify median survival
                                     break.time.by = 1,
                                     facet.by = "sex",
                                     ggtheme = theme_bw(), # Change ggplot2 theme
                                     palette = "aaas",
                                     legend = "bottom",
                                     xlab = "Time (years)",
                                     ylab = "Death probability",
                                     panel.labs = list(sex_recoded=c("Male", "Female")),
                                     legend.labs = c("A", "B")
) 

plot_test

此代码很好用,并生成以下图:

enter image description here

但是,当我尝试将此代码转换为函数或FOR循环,以便将相同的代码应用于variable1和variable2时,在绘制步骤的颜色/调色板部分始终会出现错误。

# Variables_with_2_categories:  variable1 and variable2
two <- c("variable1", "variable2")

## TEST #1: USING A FUNCTION

fit_plot_function <- function(x) {

# FIT part of the function
  two.i <- two[i]

fit_temp <- survfit(Surv(as.numeric(follow_up), as.numeric(status)) ~ 
                        sex + eval(as.name(paste0(two.i))), data = data)

# PLOT part of the function
  plot_temp <- ggsurvplot_facet(fit_temp,
                                data = data,
                                pval = TRUE,
                                conf.int = TRUE,
                                surv.median.line = "hv", # Specify median survival
                                break.time.by = 1,
                                facet.by = "sex",
                                ggtheme = theme_bw(), # Change ggplot2 theme
                                palette = "aaas",
                                legend = "bottom",
                                xlab = "Time (years)",
                                ylab = "Death probability",
                                panel.labs = list(sex_recoded=c("Male", "Female")),
                                legend.labs = rep(c("A", "B"),2)
  ) 
}


fit_plot_function(two)
# Warning message:
#  Now, to change color palette, use the argument palette= 
#  'eval(as.name(paste0(two.i)))' instead of color = 'eval(as.name(paste0(two.i)))' 

print(plot_temp)

# Error in grDevices::col2rgb(colour, TRUE) : 
#  invalid color name 'eval(as.name(paste0(two.i)))'

当它评估用向量解析的变量的名称时,似乎无法识别变量名称。使用FOR循环,其发生的过程完全相同:

## TEST #2: USING A FOR LOOP

n.two <- length(two)

for(i in 1:n.two) {
  two.i <- two[i]

  fit_temp <- survfit(Surv(as.numeric(follow_up), as.numeric(status)) ~ 
                        (sex + eval(as.name(paste0(two.i)))), data = data)



  plot_temp <- ggsurvplot_facet(fit_temp,
                                data = data,
                                pval = TRUE,
                                conf.int = TRUE,
                                surv.median.line = "hv", # Specify median survival
                                break.time.by = 1,
                                facet.by = "sex",
                                ggtheme = theme_bw(), # Change ggplot2 theme
                                palette = "aaas",
                                legend = "bottom",
                                xlab = "Time (years)",
                                ylab = "Death probability",
                                panel.labs = list(sex_recoded=c("Male", "Female")),
                                legend.labs = rep(c("A", "B"),2)
    ) 
}

print(plot_temp)

# ERROR: Now, to change color palette, use the argument palette= 'eval(as.name(paste0(two.i)))' 
# instead of color = 'eval(as.name(paste0(two.i)))

作为一个补充说明,如果我可以将相同的代码应用于同时具有两个或三个不同值的变量,而不必为每个变量应用不同的函数,那将是很好的。

非常感谢您的帮助,

最好的问候,

酪蛋白

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] survminer_0.4.3.999 ggpubr_0.2          magrittr_1.5        ggplot2_3.1.1       survival_2.44-1.1  
[6] dplyr_0.8.0.1       msm_1.6.7           mgcv_1.8-27         nlme_3.1-137       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        pillar_1.3.1      compiler_3.5.1    plyr_1.8.4        tools_3.5.1       digest_0.6.18    
 [7] tibble_2.1.1      gtable_0.3.0      lattice_0.20-38   pkgconfig_2.0.2   rlang_0.3.4       Matrix_1.2-17    
[13] ggsci_2.9         rstudioapi_0.10   cmprsk_2.2-7      yaml_2.2.0        mvtnorm_1.0-10    expm_0.999-4     
[19] xfun_0.6          gridExtra_2.3     knitr_1.22        withr_2.1.2       survMisc_0.5.5    generics_0.0.2   
[25] grid_3.5.1        tidyselect_0.2.5  data.table_1.12.2 glue_1.3.1        KMsurv_0.1-5      R6_2.4.0         
[31] km.ci_0.5-2       purrr_0.3.2       tidyr_0.8.3       scales_1.0.0      backports_1.1.4   splines_3.5.1    
[37] assertthat_0.2.1  xtable_1.8-3      colorspace_1.4-1  labeling_0.3      lazyeval_0.2.2    munsell_0.5.0    
[43] broom_0.5.2       crayon_1.3.4      zoo_1.8-5   

1 个答案:

答案 0 :(得分:0)

是时候整顿了。您可以使用purrr完成任何操作。您可以阅读有关制作ggplot2 purrr here和更多示例here的信息。

首先,我们需要使用tidyr::gather将您的数据转换为长格式。除了变量1,2,3,4,我们将所有内容保留在数据框中。他们会融化的。

library(tidyr)
library(dplyr)
library(purrr)

data %>% 
  gather(num, variable, -sample_id,  -sex,
         -visit_number, -age_at_enrollment,
         -follow_up, -status) %>% 
  mutate(num2 = num) %>% # We'll need this column later for the titles
  as_tibble() -> long_data


# A tibble: 2,028 x 8
   sample_id   sex    visit_number age_at_enrollment follow_up status num       variable
   <fct>       <fct>  <fct>                    <dbl>     <dbl> <fct>  <chr>        <int>
 1 sample_0001 Female 1                         56.7     0     1      variable1        0
 2 sample_0001 Female 2                         57.7     0.920 1      variable1        0
 3 sample_0001 Female 3                         58.6     1.90  1      variable1        0
 4 sample_0001 Female 4                         59.7     2.97  2      variable1        0
 5 sample_0001 Female 5                         60.7     4.01  1      variable1        0
 6 sample_0001 Female 6                         61.7     4.99  1      variable1        0
 7 sample_0002 Female 1                         55.9     0     1      variable1        1
 8 sample_0002 Female 2                         56.9     1.04  1      variable1        1
 9 sample_0002 Female 3                         58.0     2.15  1      variable1        1
10 sample_0002 Female 4                         59.0     3.08  1      variable1        1
# ... with 2,018 more rows

现在,我们需要将长数据帧转换为嵌套数据帧,然后map!使用ggsurvplot时要准确-该函数不支持tibbles期间创建的nest()

long_data %>% 
  group_by(num) %>% 
  nest() %>% 
  mutate(
    # Run survfit() for every variable
    fit_f = map(data, ~survfit(Surv(follow_up, as.numeric(status)) ~ (sex + variable), data = .)),
    # Create survplot for every variable and survfit
    plots = map2(fit_f, data, ~ggsurvplot(.x,
                                          as.data.frame(.y), # Important! convert from tibble to data.frame 
                                          pval = TRUE,
                                          conf.int = TRUE,
                                          facet.by = "sex",
                                          surv.median.line = "hv", 
                                          break.time.by = 1,
                                          ggtheme = theme_bw(),
                                          palette = "aaas",
                                          xlab = "Time (years)",
                                          ylab = "Death probability") +
                   ggtitle(paste0("This is plot of ", .y$num2)) + # Add a title
                   theme(legend.position = "bottom"))) -> plots

现在您可以通过键入以下命令来返回绘图:

plots$plots[[1]]
plots$plots[[2]]
plots$plots[[3]] 
plots$plots[[4]] # plotted below

enter image description here

并使用map2()

保存所有图
map2(paste0(unique(long_data$num), ".pdf"), plots$plots, ggsave)

更新

不幸的是,我无法弄清楚如何更改图例标签。我可以建议的唯一解决方案如下。请记住,plots$plots[[…]]ggplot对象,因此之后您可以更改所有内容。例如,要更改图例标签,我只需要添加scale_fill_discretescale_color_discrete。标题,实验室,主题等也可以这样做。

library(ggsci) # to add aaas color palette

plots$plots[[3]] +
  labs(title = "Variable 3",
       subtitle = "You just have to be the best") +
  ggsci::scale_color_aaas(guide = F) +
  ggsci::scale_fill_aaas(label = LETTERS[1:3])

enter image description here