我使用了this答案中的函数来读取多个文件并创建一个数据表。 我想在不同的列中使用文件名,对于其他“文件名”中不存在的每个变量,都用0填充它
部分数据集:
dput(dt[1:4])
structure(list(FileName = c("Sample_4C_NaIO4", "Sample_4C_NaIO4",
"Sample_4C_NaIO4", "Sample_4C_NaIO4"), smallRNA = c("TCGTACGACTCTTAGCGG",
"GTACGACTCTTAGCGG", "CTCGTACGACTCTTAGCGG", "CGTACGACTCTTAGCGG"
), counts = c(4166178L, 564940L, 89932L, 52670L)), class = c("data.table",
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x180a460>)
我的代码:
temp <- list.files(pattern = ".txt")
dt <- rbindlist( sapply(temp,fread,simplify=FALSE),
use.names = TRUE, idcol = "FileName")
dt$FileName <- gsub(".txt","",dt$FileName)
finaldt <- dcast.data.table(dt, smallRNA+counts ~FileName,
drop=FALSE,fill=0)
结果:
finaldt <- dcast.data.table(dt,smallRNA+counts ~ FileName,drop = FALSE,fill = 0)
Using 'counts' as value column. Use 'value.var' to override
Error in CJ(smallRNA = c("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAA", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAG", :
Cross product of elements provided to CJ() would result in 70585808594 rows which exceeds .Machine$integer.max == 2147483647
我考虑使用此软件包:bit64 但我不确定如何。
版本:
version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 5.1
year 2018
month 07
day 02
svn rev 74947
language R
version.string R version 3.5.1 (2018-07-02)
nickname Feather Spray
编辑
代码的最后一部分必须更改为:
finaldt <- dcast.data.table(dt, smallRNA ~FileName,
drop=FALSE,fill=0,value.var=counts)
Edit2问题,数字小于1
在组合数据集“ dt”中,没有任何值小于1:
filter(dt,counts<1)
[1] FileName smallRNA counts
<0 rows> (or 0-length row.names)
> myfiles[[1]] %>% filter(counts<1) %>% tail()
# A tibble: 6 x 2
smallRNA counts
<chr> <dbl>
1 ENST00000592744.1 ncrna chromosome:GRCh38:9:81946438:81976806:-1 gene:ENSG00000267559… 0.00106
2 ENST00000594089.1 ncrna chromosome:GRCh38:11:64778954:64779405:1 gene:ENSG00000269038… 0.00106
3 ENST00000607991.1 ncrna chromosome:GRCh38:22:38743495:38743910:1 gene:ENSG00000273076… 0.00106
4 ENST00000608972.1 ncrna chromosome:GRCh38:7:29008926:29010252:1 gene:ENSG00000272568.… 0.00106
5 ENST00000618845.1 ncrna chromosome:GRCh38:14:49863072:49864379:1 gene:ENSG00000278002… 0.00106
6 ENST00000625800.1 ncrna chromosome:GRCh38:CHR_HG2232_PATCH:233205199:233205479:1 gene… 0.00106
是否也可以包含这些值?