Question

我想做什么： 1-将文件内容读入矩阵（具有两个功能/列：ID和文本） 2-折叠具有相同ID的行，或者，如果不可能，则使用折叠数据创建新矩阵 3-在wd中输出.txt文件，其中ID为名称，文本为内容

以下是我的所作所为：

#set working directory and get file_list
myvar <- matrix(0,nrow=0,ncol=2)
colnames(myvar) <- c("PID","Seq")

for(file in file_list)
{
    print(file)
    Mymatrix <- as.matrix(read.table(file))

    for(i in 1:length(Mymatrix[,1]))
    {
        if(Mymatrix[i,1] %in% myvar[,1])
        {
            myvar[which(myvar[,1] == Mymatrix[i,1]) ,2] <- paste(myvar[which(myvar[,1] == Mymatrix[i,1]),2],Mymatrix[i,2])
        }else{
            myvar <- rbind(myvar,c(Mymatrix[i,1],Mymatrix[i,2]))
        }
    }
}

性能问题，请参阅此处的profvis输出： profvis results

以下是可重现的代码：

#Input:
a <- matrix(0,ncol=2, nrow=0)
colnames(a) <- c("id","text")

#possible data in the matrix after reading one file
a <- rbind(a,c(1,"4 5 7 7 8 1"))
a <- rbind(a,c(1,"5 5 1 3 7 5 1"))
a <- rbind(a,c(7,"5 5 1 3 7 5 1"))
a <- rbind(a,c(5,"1 3 2 25 5 1 3 7 5 1"))

#expected output after processing

   > a
     id  text                       
[1,] "1" "4 5 7 7 8 1 5 5 1 3 7 5 1"
[2,] "7" "5 5 1 3 7 5 1"            
[3,] "5" "1 3 2 25 5 1 3 7 5 1"

注意：保留折叠行后的文字顺序:( 4 5 7 7 8 1后跟5 5 1 3 7 5 1 ID=1}

如前所述，最大的问题是性能：我目前的做法需要花费很多时间。有没有像聚合或申请这样的解决方案？

Answer 1

以下是使用aggregate使用paste并使用collapse =＆＃34; ＆＃34;正如@ alexis-laz所建议的那样：

convert matrix to data.frame and aggregate by id
dfAgg <- aggregate(text ~ id, data=data.frame(a), FUN=paste, collapse=" ")

# coerce dfAgg to matrix
as.matrix(dfAgg)
     id  text                       
[1,] "1" "4 5 7 7 8 1 5 5 1 3 7 5 1"
[2,] "5" "1 3 2 25 5 1 3 7 5 1"     
[3,] "7" "5 5 1 3 7 5 1"

请注意，在此示例中不需要使用as.data.frame，因为R将自动执行强制。将强制措施明确化似乎是一种很好的编程习惯。

R分组数据中的性能问题

1 个答案: