结合Grep和前循环以构建矩阵(R)

时间:2019-04-17 15:18:54

标签: r dataframe for-loop matrix

我有大量的小型数据帧,我想将它们有意义地合并为一个,但是如何做到这一点的逻辑使我无所适从。

例如,如果我有一个看起来像这样的数据帧列表,尽管有更多的文件,其中许多我不希望出现在我的数据帧中:

MyList = c("AthosVersusAthos.csv", "AthosVerusPorthos.csv", "AthosVersusAramis.csv", "PorthosVerusAthos.csv", "PorthosVersusPorthos.csv", "PorthosVersusAramis.csv", "AramisVersusAthos.csv", "AramisVersusPorthos.csv", "AramisVerusPothos.csv", "BobVersusMary.csv", "LostCities.txt")

我想要的是将它们组合成一个大数据框。看起来像这样。

                   |                    |
 AthosVersusAthos  | PorthosVersusAthos | AramisVersusAthos
                   |                    |
 ------------------------------------------------------
                   |                    |
 AthosVerusPorthos | PothosVersusPorthos| AramisVersusPorthos
                   |                    |
 ------------------------------------------------------
                   |                    |
 AthosVersusAramis | PorthosVersusAramis| AramisVersusAramis
                   |                    |

或者更正确(样本编号仅在矩阵的一部分中):

           |       Athos      |      Porthos       |    Aramis
    -------|------------------------------------------------------
           | 10     9      5  |                    |
    Athos  | 2      10     4  |                    | 
           | 3      0      10 |                    |
    -------|------------------------------------------------------
           |                  |                    |
   Porthos |                  |                    |                  
           |                  |                    |
    -------|------------------------------------------------------
           |                  |                    |
   Aramis  |                  |                    |                  
           |                  |                    |
    -------------------------------------------------------------

到目前为止,我管理的是:

Musketeers = c("Athos", "Porthos", "Aramis")

  for(i in 1:length(Musketeers)) {
    for(j in 1:length(Musketeers)) {

    CombinedMatrix <- cbind (

      rbind(MyList[grep(paste0("^(", Musketeers[i],
      ")(?=.*Versus[", Musketeers[j], "]"), names(MyList),
      value = T, perl=T)])

  )
 }
}

我想做的是结合我的grep命令(考虑到要选择的文件数量和具体性,这很重要),然后结合rbindcbind这样矩阵的行和列就会有意义地串联起来。

我的总体计划是将所有以“ Athos”开头的数据帧合并到一列中,然后再次对以“ Porthos”和“ Aramis”开头的数据帧进行合并,然后将这三列按行合并进入最终数据框。

我知道我还差得很远,但是我不能完全理解从哪里开始。

编辑:@PierreGramme生成了一个有用的模型数据集,我认为如果最初提供它会很有用,我将在下面添加它。

Musketeers = c("Athos", "Porthos", "Aramis")
MyList = c("AthosVersusAthos.csv", "AthosVersusPorthos.csv", "AthosVersusAramis.csv", 
                    "PorthosVersusAthos.csv", "PorthosVersusPorthos.csv", "PorthosVersusAramis.csv", 
                    "AramisVersusAthos.csv", "AramisVersusPorthos.csv", "AramisVersusAramis.csv",
                    "BobVersusMary.csv", "LostCities.txt")
MyList = lapply(setNames(nm=MyList), function(x) matrix(rnorm(9), nrow=3, dimnames=list(c("a","b","c"), c("x","y","z"))) )

1 个答案:

答案 0 :(得分:1)

首先举一个可复制的例子。忠实吗?如果是这样,我将添加代码来回答

Musketeers = c("Athos", "Pothos", "Aramis")
MyList = c("AthosVersusAthos.csv", "AthosVersusPothos.csv", "AthosVersusAramis.csv", 
                    "PothosVersusAthos.csv", "PothosVersusPothos.csv", "PothosVersusAramis.csv", 
                    "AramisVersusAthos.csv", "AramisVersusPothos.csv", "AramisVersusAramis.csv",
                    "BobVersusMary.csv", "LostCities.txt")
MyList = lapply(setNames(nm=MyList), function(x) matrix(rnorm(9), nrow=3, dimnames=list(c("a","b","c"), c("x","y","z"))) )

然后将这些矩阵中的9个连接成您描述的形状的组合矩阵是否正确?

编辑: 然后用代码解决您的问题:

# Helper function to extract the relevant portion of MyList and rbind() it
makeColumns = function(n){
    re = paste0("^",n,"Versus")
    sublist = MyList[grep(re, names(MyList))]
    names(sublist) = sub(re, "", sub("\\.csv$","", names(sublist)))

    # Make sure sublist is sorted correctly and contains info on all musketeers
    sublist = sublist[Musketeers]

    # Change row and col names so that they are unique in the final result
    sublist = lapply(names(sublist), function(m) {
        res = sublist[[m]]
        rownames(res) = paste0(m,"_",rownames(res))
        colnames(res) = paste0(n,"_",colnames(res))
        res
    })

    do.call(rbind, sublist)
}

lColumns = lapply(setNames(nm=Musketeers), makeColumns)
CombinedMatrix = do.call(cbind, lColumns)