Question

假设我有一个数据框，其中包含5年的数据，显示美国50个州中50个最大城市的凶杀案数量。数据框中还有该城市的人口和拥有的枪支数量。但是，在每一行中只有一个population, homicides or guns（请参阅下面的示例中的df）：

> df1 = data.frame(state=1:50, city=rep(1:50, each=50), year=rep(1:5, each=2500), population=sample(1000:200000,12500), homicides=NA, guns=NA)
> df2 = data.frame(state=1:50, city=rep(1:50, each=50), year=rep(1:5, each=2500), population=NA, homicides=sample(1:200,12500,replace=T), guns=NA)
> df3 = data.frame(state=1:50, city=rep(1:50, each=50), year=rep(1:5, each=2500), population=NA, homicides=NA, guns=round((df1$population/sample(2:20,12500,replace=T))))
> df = rbind(df1, df2, df3)

由于表示state, city and year的唯一组合的每一行可能包含population, homicide and guns数据，而不是一个数据，因此生成的数据帧比其需要的行长25,000行。换句话说，它可能看起来像这样：

df.ideal = data.frame(state=1:50, city=rep(1:50, each=50), year=rep(1:5, each=2500), population=sample(1000:200000,12500), homicides=sample(1:200,12500,replace=T), guns=round((df1$population/sample(2:20,12500,replace=T))))

从df开始，如何合并population, guns and homicides数据行，为每个state, city, year组合创建一行？因此导致df.ideal

可悲的是，解决方案也必须适用于不平衡的数据帧 - 在一个理想的世界中，当一个值替换除了NA之外的任何值时会发出警告，这将是很好的。

R - 在给定多个标识符的情况下合并数据帧中的行以填充NA

0 个答案: