我正在使用两种格式的数据集。看起来像:
ID week1 week2 week3 ... week12
1 2 NA NA ... NA
1 NA 3 NA ... NA
1 NA NA 3 ... NA
...
1 NA NA NA ... 4
2 4 NA NA ... NA
2 NA 5 NA ... NA
2 NA NA 3 ... NA
我正在努力将其转换为纯长格式以进行分析。我希望将其设置为:
ID week value
1 1 2
1 2 3
1 3 3
...
1 12 4
2 1 4
2 2 5
2 3 3
有人可以在R中提出任何建议吗?我尝试过reshape2和dplyr / tidyr,但是选择ID变量时,我的观察结果总是太多。
答案 0 :(得分:0)
如何?
library(dplyr)
# small data sample
df <- read.table(text = 'ID week1 week2 week3 week4
1 2 NA NA NA
1 NA 3 NA NA
1 NA NA 3 NA
1 NA NA NA 4
2 4 NA NA NA
2 NA 5 NA NA
2 NA NA 3 NA', header = T)
df %>%
data.table::melt(id.vars = 'ID') %>%
na.omit()
答案 1 :(得分:0)
1)收集使用注释1末尾可重复显示的wide
,使用gather
将wide
转换为长格式,删除NA行和排序。
library(dplyr)
library(tidyr)
wide %>%
gather("week", "value", -ID) %>%
drop_na %>%
arrange(ID, week)
给予:
ID week value
1 1 week1 2
2 1 week2 3
3 1 week3 3
4 1 week4 4
5 2 week1 4
6 2 week2 5
7 2 week3 3
2)重塑仅使用基本R:
varying <- list(value = 2:5)
long <- na.omit(reshape(wide, dir = "long", timevar = "week",
varying = varying, v.names = names(varying)))[1:3]
long[order(long$ID, long$week), ]
给予:
ID week value
1.1 1 1 2
2.2 1 2 3
3.3 1 3 3
4.4 1 4 4
5.1 2 1 4
6.2 2 2 5
7.3 2 3 3
3)data.table 使用(2)中的varying
,我们可以使用data.table中的melt
。请注意,我们可以指定id.vars或measure.vars,但在注释中指出我们可能希望将其推广到多个变量,而measure.vars方法可以推广。
library(data.table)
longDT <- na.omit(melt(as.data.table(wide), measure.vars = varying,
variable.name = "week"))
setkey(longDT, ID, week)
longDT
给予:
ID week value
1: 1 week1 2
2: 1 week2 3
3: 1 week3 3
4: 1 week4 4
5: 2 week1 4
6: 2 week2 5
7: 2 week3 3
可重复使用的输入为:
Lines <- "
ID week1 week2 week3 week4
1 2 NA NA NA
1 NA 3 NA NA
1 NA NA 3 NA
1 NA NA NA 4
2 4 NA NA NA
2 NA 5 NA NA
2 NA NA 3 NA"
wide <- read.table(text = Lines, header = TRUE)
关于data.table的melt
支持多个变量。
假设我们有以下内容:
Lines2 <- "
ID week1var1 week1var2 week2var1 week2var2 week3var1 week3var2 week4var1 week4var2
1 1 2 20 NA NA NA NA NA NA
2 1 NA NA 3 30 NA NA NA NA
3 1 NA NA NA NA 3 30 NA NA
4 1 NA NA NA NA NA NA 4 40
5 2 4 40 NA NA NA NA NA NA
6 2 NA NA 5 50 NA NA NA NA
7 2 NA NA NA NA 3 30 NA NA"
wide2 <- read.table(text = Lines, header = TRUE)
library(data.table)
varying2 <- split(names(wide2)[-1],
sub("(.*\\d)(\\D.*)", "\\2", names(wide2)[-1]))
longDT2 <- na.omit(melt(as.data.table(wide2), measure.vars = varying2,
variable.name = "week"))
setkey(longDT2, ID, week)
longDT2
给予:
ID week var1 var2
1: 1 1 2 20
2: 1 2 3 30
3: 1 3 3 30
4: 1 4 4 40
5: 2 1 4 40
6: 2 2 5 50
7: 2 3 3 30