Question

如何在R中创建代码，为两个不同变量的所有相同组合添加一个变量的值？例如，我想添加所有流行的CD：403县：4017 /和所有流行的CD：406和县：4017分开。

cd  county  pop
403 4017    1474
403 4017    0
403 4017    869
403 4017    393
403 4017    773
403 4017    1108
403 4017    929
403 4017    730
403 4017    0
406 4017    0
406 4017    2982
406 4017    1254
406 4017    752
406 4017    153
406 4017    0
406 4017    0
406 4017    3775
406 4017    0
406 4017    777
406 4017    5923

如果已经回答了有关此主题的问题。我应该使用什么关键字来谷歌？

提前致谢！

Answer 1

require(plyr)
ddply(df,.(cd,county),summarize,total=sum(pop))

   cd county total
1 403   4017  6276
2 406   4017 15616

Answer 2

@Troy给出的答案可能是大多数R用户会告诉你的（即使用plyr和ddply()。

然而，由于我第一次接触数据分析是通过数据库脚本编写的，所以我仍然偏向sqldf这些类型的任务。

我还发现SQL对非R用户更加透明（我在社交科学社区经常遇到的事情，我做了大部分工作）。

以下是使用sqldf生成相同输出的问题的解决方案：

#your data assigned to dat
pop <- c(1474,0,869,393,773,1108,929,730,0
        ,0,2982,1254,752,153,0,0,3775,0
        ,777,5923)  
cd <- c(rep(403, 9), rep(406, 11))
county <- rep(4017, 20)

dat <- as.data.frame(cbind(cd, county, pop))

#load sqldf
require(sqldf)

#write a simple SQL aggregate query
#i.e. "select" your fields specifying the aggregate function for the 
#relevant field, "from" a table called dat, and "group by" cd and county
sqldf('select
        cd
        ,county
        ,sum(pop) as total
      from dat
      group by 
        cd
        ,county')

   cd county total
1 403   4017  6276
2 406   4017 15616

如何根据R中的两个不同列添加行的值？

2 个答案: