长且宽格式的数据,需要转换为R

时间:2018-11-28 15:37:45

标签: r dplyr tidyverse tidyr reshape2

我正在使用两种格式的数据集。看起来像:

ID week1 week2 week3 ... week12  
1   2     NA     NA  ...  NA  
1   NA    3      NA  ...  NA
1   NA    NA     3   ...  NA
...
1   NA    NA     NA  ...  4
2   4     NA     NA  ...  NA
2   NA    5      NA  ...  NA
2   NA    NA     3   ...  NA

我正在努力将其转换为纯长格式以进行分析。我希望将其设置为:

ID week value
1   1    2
1   2    3
1   3    3
...
1   12   4
2   1    4
2   2    5
2   3    3

有人可以在R中提出任何建议吗?我尝试过reshape2和dplyr / tidyr,但是选择ID变量时,我的观察结果总是太多。

2 个答案:

答案 0 :(得分:0)

如何?

library(dplyr)

# small data sample
df <- read.table(text = 'ID week1 week2 week3 week4  
1   2     NA     NA    NA  
1   NA    3      NA    NA
1   NA    NA     3     NA
1   NA    NA     NA    4
2   4     NA     NA    NA
2   NA    5      NA    NA
2   NA    NA     3     NA', header = T)

df %>% 
   data.table::melt(id.vars = 'ID') %>% 
   na.omit()

答案 1 :(得分:0)

1)收集使用注释1末尾可重复显示的wide,使用gatherwide转换为长格式,删除NA行和排序。

library(dplyr)
library(tidyr)

wide %>%
  gather("week", "value", -ID) %>%
  drop_na %>%
  arrange(ID, week)

给予:

  ID  week value
1  1 week1     2
2  1 week2     3
3  1 week3     3
4  1 week4     4
5  2 week1     4
6  2 week2     5
7  2 week3     3

2)重塑仅使用基本R:

varying <- list(value = 2:5)
long <- na.omit(reshape(wide, dir = "long", timevar = "week", 
  varying = varying, v.names = names(varying)))[1:3]
long[order(long$ID, long$week), ]

给予:

    ID week value
1.1  1    1     2
2.2  1    2     3
3.3  1    3     3
4.4  1    4     4
5.1  2    1     4
6.2  2    2     5
7.3  2    3     3

3)data.table 使用(2)中的varying,我们可以使用data.table中的melt。请注意,我们可以指定id.vars或measure.vars,但在注释中指出我们可能希望将其推广到多个变量,而measure.vars方法可以推广。

library(data.table)
longDT <- na.omit(melt(as.data.table(wide), measure.vars = varying, 
  variable.name = "week"))
setkey(longDT, ID, week)
longDT

给予:

   ID  week value
1:  1 week1     2
2:  1 week2     3
3:  1 week3     3
4:  1 week4     4
5:  2 week1     4
6:  2 week2     5
7:  2 week3     3

注释1

可重复使用的输入为:

Lines <- "
ID week1 week2 week3 week4
1   2     NA     NA   NA  
1   NA    3      NA   NA
1   NA    NA     3    NA
1   NA    NA     NA   4
2   4     NA     NA   NA
2   NA    5      NA   NA
2   NA    NA     3    NA"
wide <- read.table(text = Lines, header = TRUE)

注释2

关于data.table的melt支持多个变量。 假设我们有以下内容:

Lines2 <- "
ID week1var1 week1var2 week2var1 week2var2 week3var1 week3var2 week4var1 week4var2
1 1 2 20 NA NA NA NA NA NA
2 1 NA NA 3 30 NA NA NA NA
3 1 NA NA NA NA 3 30 NA NA
4 1 NA NA NA NA NA NA 4 40
5 2 4 40 NA NA NA NA NA NA
6 2 NA NA 5 50 NA NA NA NA
7 2 NA NA NA NA 3 30 NA NA"
wide2 <- read.table(text = Lines, header = TRUE)

library(data.table)

varying2 <- split(names(wide2)[-1], 
  sub("(.*\\d)(\\D.*)", "\\2", names(wide2)[-1]))

longDT2 <- na.omit(melt(as.data.table(wide2), measure.vars = varying2, 
  variable.name = "week"))
setkey(longDT2, ID, week)
longDT2

给予:

   ID week var1 var2
1:  1    1    2   20
2:  1    2    3   30
3:  1    3    3   30
4:  1    4    4   40
5:  2    1    4   40
6:  2    2    5   50
7:  2    3    3   30