使用R中的外部协变量熔化基质

时间:2018-02-19 16:49:08

标签: r reshape reshape2 melt

我有一个矩阵:

import requests
from bs4 import BeautifulSoup as bs
import os

r = requests.get('https://dbz.space/cards/')
soup = bs(r.text, 'html.parser')

if not os.path.exists('imgs'):
    os.makedirs('imgs')
os.chdir('imgs')

i = 0
for item in soup.find_all('div', imgur=True):
    imgur = item['imgur']
    if imgur:
        r = requests.get('https://i.imgur.com/{}.png'.format(imgur))
        with open('img-{}.jpg'.format(i), 'wb') as f:
            f.write(r.content)
        i += 1

和一个匹配矩阵列名称的协变量

mat1 <- matrix(rnorm(8), ncol = 4;
  ,dimnames=list(c('R1','R2'),c('C1','C2','C3','C4')))

> mat1
          C1         C2         C3        C4
R1  1.226139 -1.0604743 -0.1803689 0.3852505
R2 -1.232622 -0.5567295 -0.4146919 0.2433812

我想对协变量进行融合以获得以下结果:

融化数据给出了:

> covariate   <- factor(c('A','A','B','B'))
> t(data.frame(covariate, colnames(mat1)))
               [,1] [,2] [,3] [,4]
covariate      "A"  "A"  "B"  "B" 
colnames.mat1. "C1" "C2" "C3" "C4"

但是我希望得到以下结果:

> melt( mat1 )
      Var1 Var2      value
    1   R1   C1  1.2261395
    2   R2   C1 -1.2326215
    3   R1   C2 -1.0604743
    4   R2   C2 -0.5567295
    5   R1   C3 -0.1803689
    6   R2   C3 -0.4146919
    7   R1   C4  0.3852505
    8   R2   C4  0.2433812

我认为必须有一种方法可以使用标准的熔融函数来获得结果。我将不胜感激任何帮助。

1 个答案:

答案 0 :(得分:1)

也许最简单的方法是首先重命名矩阵的列,然后再melt

以下是一些示例,首先使用“data.table”,然后使用“tidyverse”:

library(data.table)
setDT(melt(`colnames<-`(mat1, paste(c('A','A','B','B'), colnames(mat1), sep = "_"))))[
  , c("cov", "V1") := tstrsplit(Var2, "_")][, Var2 := NULL][]
#    Var1      value cov V1
# 1:   R1  1.2261390   A C1
# 2:   R2 -1.2326220   A C1
# 3:   R1 -1.0604743   A C2
# 4:   R2 -0.5567295   A C2
# 5:   R1 -0.1803689   B C3
# 6:   R2 -0.4146919   B C3
# 7:   R1  0.3852505   B C4
# 8:   R2  0.2433812   B C4


library(tidyverse)
`colnames<-`(mat1, paste(c('A','A','B','B'), colnames(mat1), sep = "_")) %>% 
  as.data.frame() %>%
  rownames_to_column() %>%
  gather(var, val, -rowname) %>%
  separate(var, into = c("cov", "var1"))
#   rowname cov var1        val
# 1      R1   A   C1  1.2261390
# 2      R2   A   C1 -1.2326220
# 3      R1   A   C2 -1.0604743
# 4      R2   A   C2 -0.5567295
# 5      R1   B   C3 -0.1803689
# 6      R2   B   C3 -0.4146919
# 7      R1   B   C4  0.3852505
# 8      R2   B   C4  0.2433812

示例数据:

mat1 <- structure(c(1.226139, -1.232622, -1.0604743, -0.5567295, -0.1803689, 
    -0.4146919, 0.3852505, 0.2433812), .Dim = c(2L, 4L), .Dimnames = list(
        c("R1", "R2"), c("C1", "C2", "C3", "C4")))