我有一个大尺寸为42500 x 18500的大数据框。我试图替换一些字符,但它太大而无法使用矩阵。我想问一个人可以帮助我。谢谢。 以下脚本仅在我的计算机中使用小尺寸(例如42500 x 3000)的数据框中。
geno <- read.table("datatable.txt",sep="\t", head=TRUE) M <- as.matrix(geno[,c(3:dim(geno)[2])]) M[M == "U"] <- "N" ## Replace "U" with "N" H <- which(M == "H", arr.ind=TRUE) ## Identify the Hs M[H] <- geno[cbind(H[, "row"], 2)] ## Replace with H values from "type" column dat <- cbind(geno[1], M)
数据框如下所示:
SNP_ID Type Line1 Line2 Line3 Line4 Line5 Line6 SNP1 K T G T U T T SNP2 M A U A A H C SNP3 M A A A C A A SNP4 K T H T G T T SNP5 K U T T T T H SNP6 M A U A A C A
在整个数据框上运行上述脚本时,出现错误:
Error: cannot allocate vector of size 8.0 Gb
In addition: Warning messages:
1: In structure(.Call(C_objectSize, x), class = "object_size") :
Reached total allocation of 24573Mb: see help(memory.size)...
答案 0 :(得分:2)
您可以使用mutate_each_q
包中的dplyr
来优雅地执行此操作:
library(dplyr)
geno %>% mutate_each_q(funs(ifelse(.=="U","N",ifelse(.=="H",Type,.))),names(geno)[-(1:2)])
SNP_ID Type Line1 Line2 Line3 Line4 Line5 Line6
1 SNP1 K T G T N T T
2 SNP2 M A N A A M C
3 SNP3 M A A A C A A
4 SNP4 K T K T G T T
5 SNP5 K N T T T T K
6 SNP6 M A N A A C A
答案 1 :(得分:1)
试试这个。
d<-read.table(text="
SNP_ID Type Line1 Line2 Line3 Line4 Line5 Line6
SNP1 K T G T U T T
SNP2 M A U A A H C
SNP3 M A A A C A A
SNP4 K T H T G T T
SNP5 K U T T T T H
SNP6 M A U A A C A", header=TRUE, colClasses="character")
d[which(d[,1:dim(d)[2]] == "U", arr.ind=TRUE)] <- "N"
d[which(d[,1:dim(d)[2]] == "H", arr.ind=TRUE)] <-
d[which(d[,1:dim(d)[2]] == "H", arr.ind=TRUE)[,'row'], 'Type']
输出:
SNP_ID Type Line1 Line2 Line3 Line4 Line5 Line6
1 SNP1 K T G T N T T
2 SNP2 M A N A A M C
3 SNP3 M A A A C A A
4 SNP4 K T K T G T T
5 SNP5 K N T T T T K
6 SNP6 M A N A A C A
编辑:您可能希望尝试按如下方式导入数据,以确保您没有读入因素,这可能会使用不必要的内存:
d <- read.table("datatable.txt", sep="\t", header=TRUE, colClasses="character")