我有几个非制表符分隔的文件。我想将它们合并,并创建一个包含有关所有文件的某些信息的文件。
我已经尝试过此代码,但是当我对
使用循环时无法正常工作原始文件就像
Warning: Output file '02-MappedReads_HISAT2/sam_folder/SAMPLE01_unsorted_sample.sam' was specified without -S. This will not work in future HISAT 2 versions. Please use -S instead.
9437 reads; of these:
9437 (100.00%) were paired; of these:
310 (3.28%) aligned concordantly 0 times
8977 (95.13%) aligned concordantly exactly 1 time
150 (1.59%) aligned concordantly >1 times
----
310 pairs aligned concordantly 0 times; of these:
13 (4.19%) aligned discordantly 1 time
----
297 pairs aligned 0 times concordantly or discordantly; of these:
594 mates make up the pairs; of these:
306 (51.52%) aligned 0 times
282 (47.47%) aligned exactly 1 time
6 (1.01%) aligned >1 times
98.38% overall alignment rate
所以我使用read.table功能读取文件:
(report_sample <- read.table(paste0(mapping_Folder, '/', 'SAMPLE01_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE))
(final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6]))
所以输出是这样的 样本Input_Read_Pairs Mapped_reads Mapped_reads_。 reads_unmapped reads_unmapped_。 reads_uniquely_mapped reads_uniquely_mapped_。 1个样本01 9437 9437(100.00%)310(3.28%)8977(95.13%)
我只使用一个文件就可以了。如果我使用for循环效果不好
所以我使用read.table功能读取文件:
(report_sample <- read.table(paste0(mapping_Folder, '/', 'SAMPLE01_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE))
(final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6]))
所以输出是这样的
samples Input_Read_Pairs Mapped_reads Mapped_reads_. reads_unmapped reads_unmapped_. reads_uniquely_mapped reads_uniquely_mapped_.
1 SAMPLE01 9437 9437 (100.00%) 310 (3.28%) 8977 (95.13%)
我只使用一个文件就可以了。如果我使用for循环效果不好
report_sample <- array(dim = 0)
for (i in samples[,1]) {
report_sample[i] <- read.table(paste0(mapping_Folder, '/', i,'_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE, )
}
final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6])
$SAMPLE01
[1] "9437" "" "" "" "" "" "" "these:"
[9] "" "time" "" "" "discordantly;" "" "pairs;" ""
[17] "0" "" "exactly" "" ">1" "98.38%"
$SAMPLE02
[1] "9437" "" "" "" "" "" "" "these:"
[9] "" "time" "" "" "discordantly;" "" "pairs;" ""
[17] "0" "" "exactly" "" ">1" "98.38%"
$SAMPLE03
[1] "9437" "" "" "" "" "" "" "these:"
[9] "" "time" "" "" "discordantly;" "" "pairs;" ""
[17] "0" "" "exactly" "" ">1" "98.43%"
答案 0 :(得分:0)
您的示例不是100%可重现的(samples
是什么?),所以我估算一下。
TidyTable <- function(x) {
final <- data.frame('Input_Read_Pairs' = x[1,1], # add you "samples" before that
'Mapped_reads' = x[2,3],
'Mapped_reads_%' = x[2,4],
'reads_unmapped' = x[3,5],
'reads_unmapped_%' = x[3,6],
'reads_uniquely_mapped' = x[4,5],
'reads_uniquely_mapped_%' = x[4,6])
return(final)
}
report_sample <- list()
for (i in 1:3) { # change this to your "samples"
report_sample[[i]] <- read.table(paste0(mapping_Folder, '/', "output", i,".txt"),
header = F, as.is = T, fill = TRUE, sep = ' ',
skip = 1, blank.lines.skip = TRUE, text = TRUE, )
}
df <- lapply(report_sample, FUN = function(x) TidyTable(x))
do.call("rbind", df)