当前我的数据如下:
wide.df <- read.table(header = T, sep = ",", text = "
ID, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain, score, group
100, 18 , 4, 29, 30, 40, 0
101, 19, 7, 33, 40, 29, 0
103, 19, 19, 22, 30, 33, 0
200, 29, 30, 22, 33, 11, 1
233, 100, 33, 22, 44, 55, 1")
我需要将数据转换为长格式,如下所示:
ID group left.or.right mid.or.lat brain score
100 0 0 0 29 40 # 0 = left, 0=lat
100 0 1 0 30 40 # 1 = right, 0=lat
100 0 0 1 18 40 # 0 = left, 1 = mid
100 0 1 1 4 40 # 1 = right, 1 = mid
101 0 0 0 33 29 # 0 = left, 0 = lat
.
.
.
.
.
233 1 1 1 33 55 # 1= right, 1= mid
将left.mid.brain
,right.mid.brain
,left.lat.brain
,right.lat.brain
更改为因子,但它们的值仍然保持不变,每个参与者都有四行。
答案 0 :(得分:3)
tidyverse (特别是 dplyr 和 tidyr 软件包)非常擅长以下操作:
library(tidyverse)
long.df <- wide.df %>%
gather(variable, brain, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain) %>%
mutate(
left.or.right = ifelse(grepl('left', variable), 0, 1),
mid.or.lat = ifelse(grepl('lat', variable), 0, 1)
) %>%
select(ID, group, left.or.right, mid.or.lat, brain, score) %>%
arrange(ID)
ID group left.or.right mid.or.lat brain score
1 100 0 0 1 18 40
2 100 0 1 1 4 40
3 100 0 0 0 29 40
4 100 0 1 0 30 40
5 101 0 0 1 19 29
6 101 0 1 1 7 29
7 101 0 0 0 33 29
8 101 0 1 0 40 29
9 103 0 0 1 19 33
10 103 0 1 1 19 33
答案 1 :(得分:1)
另一种基于dplyr
/ tidyr
的方法,可以很好地扩展。创建长形数据后,您将拥有要拆分为"right.mid.brain"
和"right"
的值,如"mid"
的列,dplyr::separate
可以很容易地进行拆分"\\."
,并避免过多的硬编码。它为您提供了一个虚拟列,稍后我将介绍它。
到那时,您将拥有:
library(dplyr)
library(tidyr)
# 0 = left, 0 = lat
wide %>%
gather(key, value = brain, -ID, -score, -group) %>%
separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
head()
#> ID score group left.or.right mid.or.lat dummy brain
#> 1 100 40 0 left mid brain 18
#> 2 101 29 0 left mid brain 19
#> 3 103 33 0 left mid brain 19
#> 4 200 11 1 left mid brain 29
#> 5 233 55 1 left mid brain 100
#> 6 100 40 0 right mid brain 4
如果您需要进行更复杂的重新编码,则可以使用一些forcats
函数来重新编码因子水平。在这种情况下,只需要根据left.or.right == "right"
之类的条件来转换列就足够简单了,如果为true,则为1;如果为false(即,如果为左),则为0。按所需顺序选择列。>
long <- wide %>%
gather(key, value = brain, -ID, -score, -group) %>%
separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
mutate(left.or.right = as.numeric(left.or.right == "right"),
mid.or.lat = as.numeric(mid.or.lat == "mid")) %>%
select(ID, group, left.or.right, mid.or.lat, brain, score) %>%
arrange(ID)
head(long)
#> ID group left.or.right mid.or.lat brain score
#> 1 100 0 0 1 18 40
#> 2 100 0 1 1 4 40
#> 3 100 0 0 0 29 40
#> 4 100 0 1 0 30 40
#> 5 101 0 0 1 19 29
#> 6 101 0 1 1 7 29