Question

df是几年内的战斗事件＆amp;冲突。我试图计算冲突年内战斗之间的平均距离（时间）。

标题看起来像这样：

conflictId | year | event_date | event_type
107          1997   1997-01-01   1
107          1997   1997-01-01   1
20           1997   1997-01-01   1
20           1997   1997-01-01   2
20           1997   1997-01-03   1

我最初尝试的是

time_prev_total <- aggregate (event_date ~ conflictId + year, data, diff)

但我最终将event_date作为新df中的列表。尝试在df内提取列表的第一个索引位置是不成功的。

或者有人向我建议我可以在每个冲突年度内创建一个时间索引，然后滞后该索引，创建一个包含conflictId，year，event_date的新数据框，和滞后索引，然后将其与原始df合并，但将新df中的滞后索引与原始df中的旧索引相匹配。我试图实现这一点，但我不知道如何索引obs。在冲突期间，因为它是不平衡的。

Answer 1

您可以使用ddply将data.frame拆分为多个部分（每年一次和冲突）并对每个人应用一项功能。

# Sample data
n <- 100
d <- data.frame(
  conflictId = sample(1:3,       n, replace=TRUE),
  year       = sample(1990:2000, n, replace=TRUE),
  event_date = sample(0:364,     n, replace=TRUE),
  event_type = sample(1:10,      n, replace=TRUE)
)
d$event_date <- as.Date(ISOdate(d$year,1,1)) + d$event_date
library(plyr)

# Average distance between battles, within each year and conflict
ddply(
  d, 
  c("year","conflictId"), 
  summarize,
  average = mean(dist(event_date))
)

# Average distance between consecutive battles, within each year and conflict
d <- d[order(d$event_date),]
ddply(
  d, 
  c("year","conflictId"), 
  summarize,
  average = mean(diff(event_date))
)

分组事件之间的平均时间距离

1 个答案: