通过多个变量将宽数据重塑为高数据

时间:2019-08-29 00:25:07

标签: r reshape

当前我的数据如下:

wide.df <- read.table(header = T, sep = ",", text = "
ID, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain, score, group
100, 18 , 4, 29, 30, 40, 0
101, 19,  7, 33, 40, 29, 0
103, 19, 19, 22, 30, 33, 0
200, 29, 30, 22, 33, 11, 1
233, 100, 33, 22, 44, 55, 1")

我需要将数据转换为长格式,如下所示:

ID  group  left.or.right  mid.or.lat    brain     score
100   0          0             0           29        40   # 0 = left, 0=lat 
100   0          1             0           30        40   # 1 = right, 0=lat
100   0          0             1           18        40   # 0 = left, 1 = mid
100   0          1             1            4        40   # 1 = right, 1 = mid
101   0          0             0           33        29   # 0 = left, 0 = lat
.
.
.
.
.
233   1           1            1            33        55   # 1= right, 1= mid

left.mid.brainright.mid.brainleft.lat.brainright.lat.brain更改为因子,但它们的值仍然保持不变,每个参与者都有四行。

2 个答案:

答案 0 :(得分:3)

tidyverse (特别是 dplyr tidyr 软件包)非常擅长以下操作:

library(tidyverse)

long.df <- wide.df %>% 
  gather(variable, brain, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain) %>% 
  mutate(
    left.or.right = ifelse(grepl('left', variable), 0, 1),
    mid.or.lat = ifelse(grepl('lat', variable), 0, 1)
  ) %>% 
  select(ID, group, left.or.right, mid.or.lat, brain, score) %>% 
  arrange(ID)

    ID group left.or.right mid.or.lat brain score
1  100     0             0          1    18    40
2  100     0             1          1     4    40
3  100     0             0          0    29    40
4  100     0             1          0    30    40
5  101     0             0          1    19    29
6  101     0             1          1     7    29
7  101     0             0          0    33    29
8  101     0             1          0    40    29
9  103     0             0          1    19    33
10 103     0             1          1    19    33

答案 1 :(得分:1)

另一种基于dplyr / tidyr的方法,可以很好地扩展。创建长形数据后,您将拥有要拆分为"right.mid.brain""right"的值,如"mid"的列,dplyr::separate可以很容易地进行拆分"\\.",并避免过多的硬编码。它为您提供了一个虚拟列,稍后我将介绍它。

到那时,您将拥有:

library(dplyr)
library(tidyr)

# 0 = left, 0 = lat 
wide %>%
  gather(key, value = brain, -ID, -score, -group) %>%
  separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
  head()
#>    ID score group left.or.right mid.or.lat dummy brain
#> 1 100    40     0          left        mid brain    18
#> 2 101    29     0          left        mid brain    19
#> 3 103    33     0          left        mid brain    19
#> 4 200    11     1          left        mid brain    29
#> 5 233    55     1          left        mid brain   100
#> 6 100    40     0         right        mid brain     4

如果您需要进行更复杂的重新编码,则可以使用一些forcats函数来重新编码因子水平。在这种情况下,只需要根据left.or.right == "right"之类的条件来转换列就足够简单了,如果为true,则为1;如果为false(即,如果为左),则为0。按所需顺序选择列。

long <- wide %>%
  gather(key, value = brain, -ID, -score, -group) %>%
  separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
  mutate(left.or.right = as.numeric(left.or.right == "right"),
         mid.or.lat = as.numeric(mid.or.lat == "mid")) %>%
  select(ID, group, left.or.right, mid.or.lat, brain, score) %>%
  arrange(ID)

head(long)
#>    ID group left.or.right mid.or.lat brain score
#> 1 100     0             0          1    18    40
#> 2 100     0             1          1     4    40
#> 3 100     0             0          0    29    40
#> 4 100     0             1          0    30    40
#> 5 101     0             0          1    19    29
#> 6 101     0             1          1     7    29