我有TCGA中的507个文件夹用于计数数据(htseq.counts.gz),在帮助下,我已经将它们放在data.frame中,如下面的脚本所示,现在我想加入这些文件来进行差异表达分析>
# Find all files
library(stringr)
short<-list.files("Desktop/TCGA_SquamousCell/", full.names = FALSE, recursive = TRUE)
# Find the last / in order to determine the start of the file name
start <- str_locate(short,"^.+/")
# Retrieve the file name from the complete directory path
name <- substring(short, start[,2]+1)
# Create a data frame and save to disk
df <- data.frame(name, path=short, stringsAsFactors = FALSE)