重塑数据框以添加唯一年份的行并添加新序列

时间:2013-12-06 17:04:46

标签: r reshape

我的数据框如下所示:

id=c(10,10,10,25,25,25,300,4000)
yrs=c(2010,2012,2013,2008,2010,2011,2008,2008)
yrf=c(2011,2013,2013,2009,2014,2012,2014,2013)
occ=c(5656,621,7,8,10,8,15,19)
df=data.frame(id,yrs,yrf,occ)

我想重新塑造它,以便从yrs到yrf范围内的每个身份,职业和年份都拥有对应于新行“年”的自己的行。我还希望有一个新的列“序列”,简单地说是“occ”,除非具有相同年份和id的行具有多个“occ”,在这种情况下,它将是不同“occ”的字符串。被空间隔开。成品看起来像这样:

id  yrs      yrf    year    occ sequence
10  2010    2011    2010    5656    5656
10  2010    2011    2011    5656    5656
10  2012    2013    2012    621 621
10  2012    2013    2013    621 621 7
10  2013    2013    2013    7   621 7
25  2008    2009    2008    8   8
25  2008    2009    2009    8   8
25  2010    2014    2010    10  10
25  2010    2014    2011    10  10 8
25  2010    2014    2012    10  10 8
25  2010    2014    2013    10  10
25  2010    2014    2014    10  10
25  2011    2012    2011    8   10 8
25  2011    2012    2012    8   10 8
300 2008    2014    2008    15  15
300 2008    2014    2009    15  15
300 2008    2014    2010    15  15
300 2008    2014    2011    15  15
300 2008    2014    2012    15  15
300 2008    2014    2013    15  15
300 2008    2014    2014    15  15
40002008    2013    2008    19  19
40002008    2013    2009    19  19
40002008    2013    2010    19  19
40002008    2013    2011    19  19
40002008    2013    2012    19  19
40002008    2013    2013    19  19

2 个答案:

答案 0 :(得分:1)

这是一种方法(有评论,所以你可以看到每一步发生了什么):

## Figure out how much longer we need to make the data
Expand <- (df[, "yrf"] - df[, "yrs"])+1

## "expand" the original data.frame with the vector just created
df2 <- df[rep(rownames(df), Expand), ]

## Generate the sequence of years, 
##    again using the "Expand" vector just created
df2$year <- unlist(lapply(seq_along(Expand), function(x) 
  df$yrs[x] + (sequence(Expand[x])-1)), use.names = FALSE)

## Use ave, grouping by id and year, 
##    to paste together the values from the occ column
df2$sequence <- with(df2, ave(occ, id, year, FUN = function(x) 
  paste(unique(x), collapse = " ")))

以下是输出的前10行:

head(df2, 10)
#     id  yrs  yrf  occ year sequence
# 1   10 2010 2011 5656 2010     5656
# 1.1 10 2010 2011 5656 2011     5656
# 2   10 2012 2013  621 2012      621
# 2.1 10 2012 2013  621 2013    621 7
# 3   10 2013 2013    7 2013    621 7
# 4   25 2008 2009    8 2008        8
# 4.1 25 2008 2009    8 2009        8
# 5   25 2010 2014   10 2010       10
# 5.1 25 2010 2014   10 2011     10 8
# 5.2 25 2010 2014   10 2012     10 8

答案 1 :(得分:0)

NewRows <- function(x) {
  n <- x[3] - x[2] + 1
  data.frame(rep(x[1], n), rep(x[2], n), rep(x[3], n), x[2]:x[3], rep(x[4], n))
}

GetSequence <- function(x) 
  paste(df2$occ[df2[, 1] == x[1] & df2[, 4] == x[4]], collapse = " ")

df2 <- do.call("rbind", apply(df, 1, NewRows))
colnames(df2) <- c("id", "yrs", "yrf", "year", "occ")

df2$sequence <- apply(df2, 1, GetSequence)