Question

我有一个非常新手的问题。我正在使用援助工作者安全数据库，该数据库记录了1997年至今的事件报告，记录了对援助工作者的暴力事件。事件在数据集中独立标记。我想合并在给定年份中在一个国家发生的所有事件，将其他变量的值相加，并为所有国家创建一个具有相同年数的简单时间序列（1997-2013）。知道怎么做吗？

df
#   year  country totalnationals internationalskilled
# 1 1997   Rwanda              0                    3
# 2 1997 Cambodia              1                    0
# 3 1997  Somalia              0                    1
# 4 1997   Rwanda              1                    0
# 5 1997 DR Congo             10                    0
# 6 1997  Somalia              1                    0
# 7 1997   Rwanda              1                    0
# 8 1998   Angola              5                    0

其中“df”定义为：

df <- structure(list(year = c(1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 
  1997L, 1998L), country = c("Rwanda", "Cambodia", "Somalia", "Rwanda", 
  "DR Congo", "Somalia", "Rwanda", "Angola"), totalnationals = c(0L, 
  1L, 0L, 1L, 10L, 1L, 1L, 5L), internationalskilled = c(3L, 0L, 
  1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("year", "country", "totalnationals", 
  "internationalskilled"), class = "data.frame", row.names = c(NA, -8L))

我想有类似的东西：

#    year  country totalnationals internationalskilled
# 1  1997   Rwanda              2                    3
# 2  1997 Cambodia              1                    0
# 3  1997  Somalia              1                    1
# 4  1997 DR Congo             10                    0
# 5  1997   Angola              0                    0
# 6  1998   Rwanda              0                    0
# 7  1998 Cambodia              0                    0
# 8  1998  Somalia              0                    0
# 9  1998 DR Congo              0                    0
# 10 1998   Angola              5                    0

对不起，非常非常新手的问题...但到目前为止，我无法弄清楚如何做到这一点。谢谢！： - ）

Answer 1

在OP的评论后更新 -

df <- subset(df, year <= 2013 & year >= 1997)
df$totalnationals <- as.integer(df$totalnationals)
df$internationalskilled <- as.integer(df$internationalskilled)
df2 <- aggregate(data = df,cbind(totalnationals,internationalskilled)~year+country, sum)

在没有记录的情况下添加0年 -

df3 <- expand.grid(unique(df$year),unique(df$country))
df3 <- merge(df3,df2, all.x = TRUE, by = 1:2)
df3[is.na(df3)] <- 0

Answer 2

与数据表相同（在大型数据集上可以更快）。

library(data.table)
dt   <- data.table(df,key="year,country")
smry <- dt[,list(totalnationals      =sum(totalnationals), 
                 internationalskilled=sum(internationalskilled)),
           by="year,country"]
countries   <- unique(dt$country)
template    <- data.table(year=rep(1997:2013,each=length(countries)),
                          country=countries, 
                          key="year,country")
time.series <- smry[template]
time.series[is.na(time.series)]=0

将案例合并为R中的一个

2 个答案: