R矩阵到数据帧与所有组合

时间:2017-10-28 04:12:40

标签: r matrix combinations

寻找最简单的方法来获取矩阵,并转换为数据帧,其中每一行代表矩阵的唯一组合之一。

这个派上用场的地方有时我可能会创建类似距离矩阵的东西。但最终用户需要一个像布局一样的表格(例如在Excel中),以便他们可以过滤和查看各个场景以及它们的不同之处。

1)初始矩阵的外观如何

        Honda Dodge Ferrari
Honda       0     4      10
Dodge       4     0      10
Ferrari    10    10       0

2)我希望产生的输出(可接受)

    vehicle1 vehicle2 distance
1    Honda    Honda        0
2    Honda    Dodge        4
3    Honda  Ferrari       10
4    Dodge    Honda        4
5    Dodge    Dodge        0
6    Dodge  Ferrari       10
7  Ferrari    Honda       10
8  Ferrari    Dodge       10
9  Ferrari  Ferrari        0

3)我想要产生的输出(最佳情况)    此版本省略了重要的订单,并且不包括具有相同类型的vehicle1 / vehicle2(例如本田,本田,0)

    vehicle1 vehicle2 distance
1    Honda    Dodge        4
2    Honda  Ferrari       10
3    Dodge  Ferrari       10

重现的代码:

#This is just to set-up outputs for display
matrix_input = matrix(c(0,4,10,4,0,10,10,10,0), nrow=3)
colnames(matrix_input) = c('Honda','Dodge','Ferrari')
rownames(matrix_input) = c('Honda','Dodge','Ferrari')

dataframe_output = data.frame(vehicle1=c("Honda","Honda", "Honda",
                                         "Dodge","Dodge", "Dodge",
                                         "Ferrari","Ferrari", "Ferrari"),  
                              vehicle2=c("Honda","Dodge", "Ferrari",
                                         "Honda","Dodge", "Ferrari",
                                         "Honda","Dodge", "Ferrari"),
                              distance=c(0,4,10,
                                         4,0,10,
                                         10,10,0))

dataframe_output.best_case = data.frame(vehicle1=c("Honda","Honda","Dodge"),
                                        vehicle2=c("Dodge","Ferrari","Ferrari"),
                                        distance=c(4,10,10))

#(1) initial matrix format
print(matrix_input)

#(2) desired output1 (acceptable)
print(dataframe_output)

#(3) desired output2 (best case)
#Ideally, I would like the operation to only pull unique 
# combinations (where order does not matter) AND exclude same values (e.g. Honda,Honda)
print(dataframe_output.best_case)

2 个答案:

答案 0 :(得分:0)

这可能不是最好的解决方案(我确信它在谷仓周围很远)。希望可能有一个很好的1或2行代码或一些我可以利用的现有包,但最终使用下面的代码完成它。如果有人以更简单的方式进入,我会全力以赴。

#Summary: 
#This function takes a square matrix as input 
# and returns a dataframe with all 'true' combinations.

#Notes:
#(1) Loops through each row, excluding the last row.
#    Once we get to last row, all combinations will have been covered.
#(2) For each row, we start at the column +1 to right of the matrix diagonal.
#    Everything to left of diagonal (per row) will have already been covered.
#    Everything ON the diagonal will be comparing to itself, which we don't need.

m_comb_to_df = function(m, cat1, cat2, val_type) #matrix combinations to dataframe
{
  #Calc total combinations (this will be total values right of diagonal in matrix)
  #For example, if 4x4 matrix, then total combinations will be 3+2+1
  #Formula for this is ((n-1)^2+(n-1))/2
  comb = (((nrow(m)-1)^2)+(nrow(m)-1))/2

  #create new dataframe for storing matrix combinations
  df = data.frame(rep(NA, comb), rep(NA, comb), rep(NA, comb))
  colnames(df)=c(cat1, cat2, val_type)

  dfr = 1                   #dataframe row counter (start at first row)
  for(r in 1:(nrow(m)-1))   #loop through each row (except last)                  
  {                           
    for(c in (r+1):ncol(m)) #loop through columns, starting at right of diagonal (r+1)              
    {
      #print(paste(r,c,r+1)) #debug
      #store a single combination in current row (dfr) of dataframe
      df[[cat1]][dfr] = rownames(m)[r]    #store 'current' matrix row name
      df[[cat2]][dfr] = colnames(m)[c]    #store 'current' matrix column name
      df[[val_type]][dfr] = m[r,c]        #store 'current' matrix value
      dfr = dfr + 1                                
    }
  }
  return(df)
}

#matrix for testing
matrix_input = matrix(c(0,4,10,4,0,10,10,10,0), nrow=3)
colnames(matrix_input) = c('Honda','Dodge','Ferrari')
rownames(matrix_input) = c('Honda','Dodge','Ferrari')

#test function
m_comb_to_df(matrix_input, "car1", "car2", "distance")

答案 1 :(得分:0)

这里是使用reshap2包中的melt解决此问题的较短版本:

# load magrittr to use the pipe operator. 
library(magrittr)

# remove duplications (this makes sure every pair only appears once)
matrix_input[upper.tri(matrix_input)] <- NA

# melt the data.frame
df <- reshape2::melt(matrix_input, na.rm = TRUE)

# get rid of the zeros and rename variables
df %>%
  dplyr::filter(!(value == 0)) %>%
  dplyr::rename(vehicle1 = Var1, 
                vehicle2 = Var2, 
                distance = value)