我有一个非常新手的问题。我正在使用援助工作者安全数据库,该数据库记录了1997年至今的事件报告,记录了对援助工作者的暴力事件。事件在数据集中独立标记。我想合并在给定年份中在一个国家发生的所有事件,将其他变量的值相加,并为所有国家创建一个具有相同年数的简单时间序列(1997-2013)。知道怎么做吗?
df
# year country totalnationals internationalskilled
# 1 1997 Rwanda 0 3
# 2 1997 Cambodia 1 0
# 3 1997 Somalia 0 1
# 4 1997 Rwanda 1 0
# 5 1997 DR Congo 10 0
# 6 1997 Somalia 1 0
# 7 1997 Rwanda 1 0
# 8 1998 Angola 5 0
其中“df”定义为:
df <- structure(list(year = c(1997L, 1997L, 1997L, 1997L, 1997L, 1997L,
1997L, 1998L), country = c("Rwanda", "Cambodia", "Somalia", "Rwanda",
"DR Congo", "Somalia", "Rwanda", "Angola"), totalnationals = c(0L,
1L, 0L, 1L, 10L, 1L, 1L, 5L), internationalskilled = c(3L, 0L,
1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("year", "country", "totalnationals",
"internationalskilled"), class = "data.frame", row.names = c(NA, -8L))
我想有类似的东西:
# year country totalnationals internationalskilled
# 1 1997 Rwanda 2 3
# 2 1997 Cambodia 1 0
# 3 1997 Somalia 1 1
# 4 1997 DR Congo 10 0
# 5 1997 Angola 0 0
# 6 1998 Rwanda 0 0
# 7 1998 Cambodia 0 0
# 8 1998 Somalia 0 0
# 9 1998 DR Congo 0 0
# 10 1998 Angola 5 0
对不起,非常非常新手的问题...但到目前为止,我无法弄清楚如何做到这一点。谢谢! : - )
答案 0 :(得分:1)
在OP的评论后更新 -
df <- subset(df, year <= 2013 & year >= 1997)
df$totalnationals <- as.integer(df$totalnationals)
df$internationalskilled <- as.integer(df$internationalskilled)
df2 <- aggregate(data = df,cbind(totalnationals,internationalskilled)~year+country, sum)
在没有记录的情况下添加0年 -
df3 <- expand.grid(unique(df$year),unique(df$country))
df3 <- merge(df3,df2, all.x = TRUE, by = 1:2)
df3[is.na(df3)] <- 0
答案 1 :(得分:1)
与数据表相同(在大型数据集上可以更快)。
library(data.table)
dt <- data.table(df,key="year,country")
smry <- dt[,list(totalnationals =sum(totalnationals),
internationalskilled=sum(internationalskilled)),
by="year,country"]
countries <- unique(dt$country)
template <- data.table(year=rep(1997:2013,each=length(countries)),
country=countries,
key="year,country")
time.series <- smry[template]
time.series[is.na(time.series)]=0