扩大R的年份范围

时间:2015-02-17 02:33:06

标签: r

我目前正在使用Ethnic Power Relations 2014 data set。这是我想要操作的一小段数据:

     statename       from   to    gwgroupid size
[,1] United States   1966   2008  201000    0.691
[,2] United States   1966   2008  201000    0.125
[,3] United States   1966   2008  203000    0.124

其中是观察的第一年和最后一年, gwgroupid 是一个特定种族群体的标记特定的国家。

我想扩展数据集,以便记录描述的范围内每年的观察结果,然后删除。扩展数据集的前三行如下所示:

     statename       year    gwgroupid size
[,1] United States   1966    201000    0.691
[,2] United States   1967    201000    0.691
[,3] United States   1968    201000    0.691

鉴于每个国家的年龄范围不同,我该怎么做?

2 个答案:

答案 0 :(得分:1)

您可以使用unnest包中的tidyr功能:

library(tidyr)

df$year <- mapply(seq,df$from,df$to,SIMPLIFY=FALSE)

df %>% 
  unnest(year) %>% 
  select(-from,-to)

#       statename gwgroupid  size year
#1   UnitedStates    201000 0.691 1966
#2   UnitedStates    201000 0.691 1967
#3   UnitedStates    201000 0.691 1968

[更新] 或者,您可以使用data.table包:

library(data.table)
as.data.table(df)[,.(year=seq(from,to)),by=.(statename,gwgroupid,size)]

答案 1 :(得分:0)

这样做......可能会有更清洁,更快捷的方式 -

您的数据:

df<-
read.table(text="
statename       from   to    gwgroupid size
UnitedStates   1966   2008  201000    0.691
UnitedStates   1966   2008  202000    0.125
UnitedStates   1966   2008  203000    0.124", header=T)


library(dplyr)
df$freq <- df$to - df$from
    df.expanded <- df[rep(row.names(df), df$freq), 1:5]


    df.expanded %>% 
      group_by(statename) %>%
      mutate(year = from + row_number(from)) %>% 
     select(statename, year, gwgroupid, size) 

得到:

      statename year gwgroupid  size
1  UnitedStates 1967    201000 0.691
2  UnitedStates 1968    201000 0.691
3  UnitedStates 1969    201000 0.691
4  UnitedStates 1970    201000 0.691
5  UnitedStates 1971    201000 0.691
6  UnitedStates 1972    201000 0.691
7  UnitedStates 1973    201000 0.691
8  UnitedStates 1974    201000 0.691
9  UnitedStates 1975    201000 0.691
10 UnitedStates 1976    201000 0.691
..          ...  ...       ...   ...

编辑:只是注意到你的结果需要'gwgroupid'在第1-3行增加但是大小保持不变....你想要的结果是否正确?