我的数据框如下所示:
id=c(10,10,10,25,25,25,300,4000)
yrs=c(2010,2012,2013,2008,2010,2011,2008,2008)
yrf=c(2011,2013,2013,2009,2014,2012,2014,2013)
occ=c(5656,621,7,8,10,8,15,19)
df=data.frame(id,yrs,yrf,occ)
我想重新塑造它,以便从yrs到yrf范围内的每个身份,职业和年份都拥有对应于新行“年”的自己的行。我还希望有一个新的列“序列”,简单地说是“occ”,除非具有相同年份和id的行具有多个“occ”,在这种情况下,它将是不同“occ”的字符串。被空间隔开。成品看起来像这样:
id yrs yrf year occ sequence
10 2010 2011 2010 5656 5656
10 2010 2011 2011 5656 5656
10 2012 2013 2012 621 621
10 2012 2013 2013 621 621 7
10 2013 2013 2013 7 621 7
25 2008 2009 2008 8 8
25 2008 2009 2009 8 8
25 2010 2014 2010 10 10
25 2010 2014 2011 10 10 8
25 2010 2014 2012 10 10 8
25 2010 2014 2013 10 10
25 2010 2014 2014 10 10
25 2011 2012 2011 8 10 8
25 2011 2012 2012 8 10 8
300 2008 2014 2008 15 15
300 2008 2014 2009 15 15
300 2008 2014 2010 15 15
300 2008 2014 2011 15 15
300 2008 2014 2012 15 15
300 2008 2014 2013 15 15
300 2008 2014 2014 15 15
40002008 2013 2008 19 19
40002008 2013 2009 19 19
40002008 2013 2010 19 19
40002008 2013 2011 19 19
40002008 2013 2012 19 19
40002008 2013 2013 19 19
答案 0 :(得分:1)
这是一种方法(有评论,所以你可以看到每一步发生了什么):
## Figure out how much longer we need to make the data
Expand <- (df[, "yrf"] - df[, "yrs"])+1
## "expand" the original data.frame with the vector just created
df2 <- df[rep(rownames(df), Expand), ]
## Generate the sequence of years,
## again using the "Expand" vector just created
df2$year <- unlist(lapply(seq_along(Expand), function(x)
df$yrs[x] + (sequence(Expand[x])-1)), use.names = FALSE)
## Use ave, grouping by id and year,
## to paste together the values from the occ column
df2$sequence <- with(df2, ave(occ, id, year, FUN = function(x)
paste(unique(x), collapse = " ")))
以下是输出的前10行:
head(df2, 10)
# id yrs yrf occ year sequence
# 1 10 2010 2011 5656 2010 5656
# 1.1 10 2010 2011 5656 2011 5656
# 2 10 2012 2013 621 2012 621
# 2.1 10 2012 2013 621 2013 621 7
# 3 10 2013 2013 7 2013 621 7
# 4 25 2008 2009 8 2008 8
# 4.1 25 2008 2009 8 2009 8
# 5 25 2010 2014 10 2010 10
# 5.1 25 2010 2014 10 2011 10 8
# 5.2 25 2010 2014 10 2012 10 8
答案 1 :(得分:0)
NewRows <- function(x) {
n <- x[3] - x[2] + 1
data.frame(rep(x[1], n), rep(x[2], n), rep(x[3], n), x[2]:x[3], rep(x[4], n))
}
GetSequence <- function(x)
paste(df2$occ[df2[, 1] == x[1] & df2[, 4] == x[4]], collapse = " ")
df2 <- do.call("rbind", apply(df, 1, NewRows))
colnames(df2) <- c("id", "yrs", "yrf", "year", "occ")
df2$sequence <- apply(df2, 1, GetSequence)