修改R脚本以在Snakemake中使用命令行参数

时间:2019-04-03 02:19:40

标签: r bioinformatics snakemake

我写了这个小R脚本来生成DNA序列覆盖率数据的图,并将其作为目录中所有文件的输入。

coverage.files<-list.files("~/coverage_plotting", full.names = TRUE, pattern = ".txt")
coverage.names<-list.files("~/coverage_plotting", full.names = F, pattern=".txt")
pdf.files <- gsub("txt","pdf", coverage.file)
plot.colors <- c("red","blue","green","yellow","purple")
for(i in 1:length(coverage.name)) {
  coverage <- read.delim(coverage.file[i])
  pdf(pdf.files[i], width = 5, height= 4)
  colnames(coverage) <- c("contig", "position", "coverage")
  contigs <- unique(coverage[,1])
  plot(-100,-100, xlim=c(0,800), ylim=c(0,500000), xlab="Coverage", ylab="Number of basepairs")
  for(j in contigs) {
    contig.cov <- subset(coverage,contig==j)
    cov.hist <- hist(contig.cov$coverage, breaks=seq(0,5000, by = 2), plot=F)
    points(cov.hist$mids, cov.hist$counts, type="p", col=plot.colors[j], pch=19, cex=0.5)
  }
  dev.off()
}

我现在想将该脚本包含在Snakemake文件中,因此想对其进行更改以将单个文件作为来自命令行的输入。我找到了commandArgs()并尝试使用它,也摆脱了第一个循环,因为现在一次只输入一个文件。我最终得到的是这样的

coverage.file <- commandArgs()
pdf.file <- gsub("txt","pdf", coverage.file)
plot.colors <- c("red","blue","green","yellow","purple")
coverage <- read.delim(coverage.file)
pdf(pdf.file, width = 5, height= 4)
colnames(coverage) <- c("contig", "position", "coverage")
contigs <- unique(coverage[,1])
plot(-100,-100, xlim=c(0,800), ylim=c(0,500000), xlab="Coverage", ylab="Number of basepairs")
  for(j in contigs) {
    contig.cov <- subset(coverage,contig==j)
    cov.hist <- hist(contig.cov$coverage, breaks=seq(0,5000, by = 2), plot=F)
    points(cov.hist$mids, cov.hist$counts, type="p", col=plot.colors[j], pch=19, cex=0.5)
  }
  dev.off()

运行它时,出现以下错误,

Error in file(file, "rt") : cannot open the connection
Calls: read.delim -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'coverage.file': No such file or directory
Execution halted

有人对我应该如何修改它以从命令行获取单个输入有任何建议吗?

谢谢

1 个答案:

答案 0 :(得分:0)

R文档声明了commandArgs()

  

     

一个字符向量,其中包含可执行文件的名称和用户提供的命令行参数。第一个元素是调用R的可执行文件的名称。此元素的确切形式取决于平台:它可以是标准名称,或者只是应用程序的最后一个组件(或基本名称),对于嵌入式R,它可以是程序员提供的任何东西。如果railingOnly = TRUE,则为--args之后提供的那些参数(如果有)的字符向量。

请参见https://www.rdocumentation.org/packages/base/versions/3.0.3/topics/commandArgs

因此,对象coverage.file是一个向量,您应该通过在向量中指定位置来访问参数。例如:

args <- commandArgs(trailingOnly=TRUE)
# access i'th argument depending how you write you shell command in the snakemake. ex:
coverage.file <- args[1]
...