在R中的列中转换行

时间:2018-04-24 21:32:47

标签: r database

我有这样的数据集:

ID       DATE      VALUE
9101001 11-04-2010  4
9101001 11-10-2010  4
9101002 28-12-2009  104
9101002 31-03-2010  193
9101002 26-08-2010  130
9101002 13-01-2011  128
9101002 12-04-2011  27
9101002 08-12-2011  18
9101002 17-07-2012  85
9101002 10-10-2012  86
9101002 19-12-2012  4
9101002 21-01-2013  31
9101003 16-09-2008  273
9101003 24-03-2009  311
9101003 15-03-2011  166
9101003 21-04-2011  62

我需要将它转移到这样:

ID       DATE1      VALUE1 DATE2     VALU2 DATE3 VALUE3 etc
9101001  11-04-2010   4   11-10-2010  2

因此,每个ID只有一行

有人可以帮忙吗?非常感谢!

2 个答案:

答案 0 :(得分:0)

使用newdat <- dat %>% group_by(ID) %>% mutate(n = row_number()) %>% ungroup() %>% gather(k, v, -ID, -n) %>% unite(k, c(k, n), sep="") %>% spread(k, v) newdat # # A tibble: 3 × 21 # ID DATE1 DATE10 DATE2 DATE3 DATE4 DATE5 DATE6 # * <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> # 1 9101001 11-04-2010 <NA> 11-10-2010 <NA> <NA> <NA> <NA> # 2 9101002 28-12-2009 21-01-2013 31-03-2010 26-08-2010 13-01-2011 12-04-2011 08-12-2011 # 3 9101003 16-09-2008 <NA> 24-03-2009 15-03-2011 21-04-2011 <NA> <NA> # # ... with 13 more variables: DATE7 <chr>, DATE8 <chr>, DATE9 <chr>, VALUE1 <chr>, # # VALUE10 <chr>, VALUE2 <chr>, VALUE3 <chr>, VALUE4 <chr>, VALUE5 <chr>, VALUE6 <chr>, # # VALUE7 <chr>, VALUE8 <chr>, VALUE9 <chr> 中的两个包:

newdat[c(1, 1L + order(
  as.integer(gsub("[^0-9]", "", colnames(newdat[-1]))),
  colnames(newdat[-1])
))]
# # A tibble: 3 × 21
#        ID      DATE1 VALUE1      DATE2 VALUE2      DATE3 VALUE3      DATE4 VALUE4      DATE5
#     <int>      <chr>  <chr>      <chr>  <chr>      <chr>  <chr>      <chr>  <chr>      <chr>
# 1 9101001 11-04-2010      4 11-10-2010      4       <NA>   <NA>       <NA>   <NA>       <NA>
# 2 9101002 28-12-2009    104 31-03-2010    193 26-08-2010    130 13-01-2011    128 12-04-2011
# 3 9101003 16-09-2008    273 24-03-2009    311 15-03-2011    166 21-04-2011     62       <NA>
# # ... with 11 more variables: VALUE5 <chr>, DATE6 <chr>, VALUE6 <chr>, DATE7 <chr>,
# #   VALUE7 <chr>, DATE8 <chr>, VALUE8 <chr>, DATE9 <chr>, VALUE9 <chr>, DATE10 <chr>,
# #   VALUE10 <chr>

因此,这会为您提供正确的列,但不是正确的顺序。如果那很重要:

c(1L, 1L + ...

$ID的内容是从重新排序中删除dat <- read.table(text='ID DATE VALUE 9101001 11-04-2010 4 9101001 11-10-2010 4 9101002 28-12-2009 104 9101002 31-03-2010 193 9101002 26-08-2010 130 9101002 13-01-2011 128 9101002 12-04-2011 27 9101002 08-12-2011 18 9101002 17-07-2012 85 9101002 10-10-2012 86 9101002 19-12-2012 4 9101002 21-01-2013 31 9101003 16-09-2008 273 9101003 24-03-2009 311 9101003 15-03-2011 166 9101003 21-04-2011 62', header=TRUE, stringsAsFactors=FALSE) 。几乎可以肯定有其他方法可以重新排列列。

可复制数据:

width: 100%

答案 1 :(得分:0)

在这种特殊情况下,每行的列数会有所不同,ID提供了一种优雅的方法,可以在总结基于concatenate的值后实现解决方案。

使用分隔符(比如|),方法是使用DATE和VALUE的splitstackshape::cSplit个值。现在,library(splitstackshape) library(dplyr) df_new <- df %>% group_by(ID) %>% summarise(DATE = paste0(DATE,collapse="|"), VALUE=paste0(VALUE,collapse="|")) %>% cSplit(c("DATE","VALUE"), sep = "|") 可用于分隔这些列。

# ID    DATE_01    DATE_02    DATE_03    DATE_04    DATE_05    DATE_06    DATE_07    DATE_08    DATE_09
# 1: 9101001 11-04-2010 11-10-2010         NA         NA         NA         NA         NA         NA         NA
# 2: 9101002 28-12-2009 31-03-2010 26-08-2010 13-01-2011 12-04-2011 08-12-2011 17-07-2012 10-10-2012 19-12-2012
# 3: 9101003 16-09-2008 24-03-2009 15-03-2011 21-04-2011         NA         NA         NA         NA         NA
# DATE_10 VALUE_01 VALUE_02 VALUE_03 VALUE_04 VALUE_05 VALUE_06 VALUE_07 VALUE_08 VALUE_09 VALUE_10
# 1:         NA        4        4       NA       NA       NA       NA       NA       NA       NA       NA
# 2: 21-01-2013      104      193      130      128       27       18       85       86        4       31
# 3:         NA      273      311      166       62       NA       NA       NA       NA       NA       NA

<强>结果:

df <- read.table(text = 
"ID       DATE      VALUE
9101001 11-04-2010  4
9101001 11-10-2010  4
9101002 28-12-2009  104
9101002 31-03-2010  193
9101002 26-08-2010  130
9101002 13-01-2011  128
9101002 12-04-2011  27
9101002 08-12-2011  18
9101002 17-07-2012  85
9101002 10-10-2012  86
9101002 19-12-2012  4
9101002 21-01-2013  31
9101003 16-09-2008  273
9101003 24-03-2009  311
9101003 15-03-2011  166
9101003 21-04-2011  62",
header = T)

数据:

{{1}}