我有这样的数据集:
ID DATE VALUE
9101001 11-04-2010 4
9101001 11-10-2010 4
9101002 28-12-2009 104
9101002 31-03-2010 193
9101002 26-08-2010 130
9101002 13-01-2011 128
9101002 12-04-2011 27
9101002 08-12-2011 18
9101002 17-07-2012 85
9101002 10-10-2012 86
9101002 19-12-2012 4
9101002 21-01-2013 31
9101003 16-09-2008 273
9101003 24-03-2009 311
9101003 15-03-2011 166
9101003 21-04-2011 62
我需要将它转移到这样:
ID DATE1 VALUE1 DATE2 VALU2 DATE3 VALUE3 etc
9101001 11-04-2010 4 11-10-2010 2
因此,每个ID只有一行
有人可以帮忙吗?非常感谢!
答案 0 :(得分:0)
使用newdat <- dat %>%
group_by(ID) %>%
mutate(n = row_number()) %>%
ungroup() %>%
gather(k, v, -ID, -n) %>%
unite(k, c(k, n), sep="") %>%
spread(k, v)
newdat
# # A tibble: 3 × 21
# ID DATE1 DATE10 DATE2 DATE3 DATE4 DATE5 DATE6
# * <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 9101001 11-04-2010 <NA> 11-10-2010 <NA> <NA> <NA> <NA>
# 2 9101002 28-12-2009 21-01-2013 31-03-2010 26-08-2010 13-01-2011 12-04-2011 08-12-2011
# 3 9101003 16-09-2008 <NA> 24-03-2009 15-03-2011 21-04-2011 <NA> <NA>
# # ... with 13 more variables: DATE7 <chr>, DATE8 <chr>, DATE9 <chr>, VALUE1 <chr>,
# # VALUE10 <chr>, VALUE2 <chr>, VALUE3 <chr>, VALUE4 <chr>, VALUE5 <chr>, VALUE6 <chr>,
# # VALUE7 <chr>, VALUE8 <chr>, VALUE9 <chr>
中的两个包:
newdat[c(1, 1L + order(
as.integer(gsub("[^0-9]", "", colnames(newdat[-1]))),
colnames(newdat[-1])
))]
# # A tibble: 3 × 21
# ID DATE1 VALUE1 DATE2 VALUE2 DATE3 VALUE3 DATE4 VALUE4 DATE5
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 9101001 11-04-2010 4 11-10-2010 4 <NA> <NA> <NA> <NA> <NA>
# 2 9101002 28-12-2009 104 31-03-2010 193 26-08-2010 130 13-01-2011 128 12-04-2011
# 3 9101003 16-09-2008 273 24-03-2009 311 15-03-2011 166 21-04-2011 62 <NA>
# # ... with 11 more variables: VALUE5 <chr>, DATE6 <chr>, VALUE6 <chr>, DATE7 <chr>,
# # VALUE7 <chr>, DATE8 <chr>, VALUE8 <chr>, DATE9 <chr>, VALUE9 <chr>, DATE10 <chr>,
# # VALUE10 <chr>
因此,这会为您提供正确的列,但不是正确的顺序。如果那很重要:
c(1L, 1L + ...
$ID
的内容是从重新排序中删除dat <- read.table(text='ID DATE VALUE
9101001 11-04-2010 4
9101001 11-10-2010 4
9101002 28-12-2009 104
9101002 31-03-2010 193
9101002 26-08-2010 130
9101002 13-01-2011 128
9101002 12-04-2011 27
9101002 08-12-2011 18
9101002 17-07-2012 85
9101002 10-10-2012 86
9101002 19-12-2012 4
9101002 21-01-2013 31
9101003 16-09-2008 273
9101003 24-03-2009 311
9101003 15-03-2011 166
9101003 21-04-2011 62', header=TRUE, stringsAsFactors=FALSE)
。几乎可以肯定有其他方法可以重新排列列。
可复制数据:
width: 100%
答案 1 :(得分:0)
在这种特殊情况下,每行的列数会有所不同,ID
提供了一种优雅的方法,可以在总结基于concatenate
的值后实现解决方案。
使用分隔符(比如|
),方法是使用DATE和VALUE的splitstackshape::cSplit
个值。现在,library(splitstackshape)
library(dplyr)
df_new <- df %>% group_by(ID) %>%
summarise(DATE = paste0(DATE,collapse="|"),
VALUE=paste0(VALUE,collapse="|")) %>%
cSplit(c("DATE","VALUE"), sep = "|")
可用于分隔这些列。
# ID DATE_01 DATE_02 DATE_03 DATE_04 DATE_05 DATE_06 DATE_07 DATE_08 DATE_09
# 1: 9101001 11-04-2010 11-10-2010 NA NA NA NA NA NA NA
# 2: 9101002 28-12-2009 31-03-2010 26-08-2010 13-01-2011 12-04-2011 08-12-2011 17-07-2012 10-10-2012 19-12-2012
# 3: 9101003 16-09-2008 24-03-2009 15-03-2011 21-04-2011 NA NA NA NA NA
# DATE_10 VALUE_01 VALUE_02 VALUE_03 VALUE_04 VALUE_05 VALUE_06 VALUE_07 VALUE_08 VALUE_09 VALUE_10
# 1: NA 4 4 NA NA NA NA NA NA NA NA
# 2: 21-01-2013 104 193 130 128 27 18 85 86 4 31
# 3: NA 273 311 166 62 NA NA NA NA NA NA
<强>结果:强>
df <- read.table(text =
"ID DATE VALUE
9101001 11-04-2010 4
9101001 11-10-2010 4
9101002 28-12-2009 104
9101002 31-03-2010 193
9101002 26-08-2010 130
9101002 13-01-2011 128
9101002 12-04-2011 27
9101002 08-12-2011 18
9101002 17-07-2012 85
9101002 10-10-2012 86
9101002 19-12-2012 4
9101002 21-01-2013 31
9101003 16-09-2008 273
9101003 24-03-2009 311
9101003 15-03-2011 166
9101003 21-04-2011 62",
header = T)
数据:强>
{{1}}