我目前正在使用Ethnic Power Relations 2014 data set。这是我想要操作的一小段数据:
statename from to gwgroupid size
[,1] United States 1966 2008 201000 0.691
[,2] United States 1966 2008 201000 0.125
[,3] United States 1966 2008 203000 0.124
其中从和到是观察的第一年和最后一年, gwgroupid 是一个特定种族群体的标记特定的国家。
我想扩展数据集,以便记录从和到描述的范围内每年的观察结果,然后删除从和到。扩展数据集的前三行如下所示:
statename year gwgroupid size
[,1] United States 1966 201000 0.691
[,2] United States 1967 201000 0.691
[,3] United States 1968 201000 0.691
鉴于每个国家的年龄范围不同,我该怎么做?
答案 0 :(得分:1)
您可以使用unnest
包中的tidyr
功能:
library(tidyr)
df$year <- mapply(seq,df$from,df$to,SIMPLIFY=FALSE)
df %>%
unnest(year) %>%
select(-from,-to)
# statename gwgroupid size year
#1 UnitedStates 201000 0.691 1966
#2 UnitedStates 201000 0.691 1967
#3 UnitedStates 201000 0.691 1968
[更新] 或者,您可以使用data.table
包:
library(data.table)
as.data.table(df)[,.(year=seq(from,to)),by=.(statename,gwgroupid,size)]
答案 1 :(得分:0)
这样做......可能会有更清洁,更快捷的方式 -
您的数据:
df<-
read.table(text="
statename from to gwgroupid size
UnitedStates 1966 2008 201000 0.691
UnitedStates 1966 2008 202000 0.125
UnitedStates 1966 2008 203000 0.124", header=T)
library(dplyr)
df$freq <- df$to - df$from
df.expanded <- df[rep(row.names(df), df$freq), 1:5]
df.expanded %>%
group_by(statename) %>%
mutate(year = from + row_number(from)) %>%
select(statename, year, gwgroupid, size)
得到:
statename year gwgroupid size
1 UnitedStates 1967 201000 0.691
2 UnitedStates 1968 201000 0.691
3 UnitedStates 1969 201000 0.691
4 UnitedStates 1970 201000 0.691
5 UnitedStates 1971 201000 0.691
6 UnitedStates 1972 201000 0.691
7 UnitedStates 1973 201000 0.691
8 UnitedStates 1974 201000 0.691
9 UnitedStates 1975 201000 0.691
10 UnitedStates 1976 201000 0.691
.. ... ... ... ...
编辑:只是注意到你的结果需要'gwgroupid'在第1-3行增加但是大小保持不变....你想要的结果是否正确?