按年份排序然后计算每个县的虚拟变量(ddply)

时间:2014-11-23 03:21:44

标签: r dataframe plyr

我正在构建复制文件的摘要表,并且在使用ddply完成任务时遇到了问题。

我需要按年份排序,然后计算每年有多少个县(fips)的虚拟变量== 1(RVPI == 1)。

这是我数据的一小部分: https://www.dropbox.com/s/saem1xasai9t1zv/data2.csv?dl=0

这是我到目前为止的代码:

请注意,RVPI只计算当年的所有观察结果而不计算县。

感谢您提供的任何帮助。

data <- read.csv("/.../data2.csv")

ddply(data, .(year), summarize,
      Observations = length(year),
      Total_Monitors = length(unique(indivID)),
      Urban = round(length(urban == 1) / length(unique(fips))),
      Counties = length(unique(fips)),
      RVPI_Counties = sum(RVPI == 1))  #Reports total and doesn't account for counties(fips)

CUrrent输出:

   year Observations Total_Monitors Urban Counties RVPI_Counties
1  1989          147              2    74        2  147
2  1990          209              4    52        4  209
3  1991          554              7    79        7  554
4  1992          389              5    78        5    0
5  1993          422              5    84        5    0
6  1994          397              5    79        5    0
7  1995          453              5    91        5    0
8  1996          453              5    91        5    0
9  1997          391              5    78        5    0
10 1998          454              5    91        5    0
11 1999          451              5    90        5    0
12 2000          444              5    89        5    0
13 2001          449              5    90        5    0
14 2002          458              5    92        5    0
15 2003          454              5    91        5    0

2 个答案:

答案 0 :(得分:2)

如果你不是ddply,那么这是一个data.table解决方案。

data <- read.csv("data2.csv")
library(data.table)
setDT(data)[,list(Observations=.N,
                Total_Monitors=length(unique(indivID)),
                Urban         =round(sum(urban==1)/length(unique(fips))),
                Counties      =length(unique(fips)),
                RVPI_Counties =length(unique(fips[RVPI==1]))),
          by=year]
#     year Observations Total_Monitors Urban Counties RVPI_Counties
#  1: 1991          554              7    25        7             7
#  2: 1992          389              5    17        5             0
#  3: 1993          422              5    18        5             0
#  4: 1994          397              5    18        5             0
#  5: 1995          453              5    18        5             0
#  6: 1996          453              5    18        5             0
#  7: 1997          391              5    18        5             0
#  8: 1998          454              5    18        5             0
#  9: 1999          451              5    17        5             0
# 10: 2000          444              5    18        5             0
# 11: 2001          449              5    18        5             0
# 12: 2002          458              5    18        5             0
# 13: 2003          454              5    18        5             0
# 14: 1990          209              4    23        4             4
# 15: 1989          147              2    46        2             2

答案 1 :(得分:1)

我还会将dplyr用于更大的数据集(类似于@ jlhoward&#39; s)

data <- read.csv('data2.csv')
library(dplyr)
data %>% 
       group_by(year) %>%
       summarise(Observations=n(), 
            Total_Monitors=n_distinct(indivID),#n_distinct contributed by @beginneR
            Urban=round(length(urban==1)/n_distinct(fips)),
            Counties=n_distinct(fips),
            RVPI_Counties=length(unique(fips[RVPI==1])))


#   year Observations Total_Monitors Urban Counties RVPI_Counties
#1  1989          147              2    74        2             2
#2  1990          209              4    52        4             4
#3  1991          554              7    79        7             7
#4  1992          389              5    78        5             0
#5  1993          422              5    84        5             0
#6  1994          397              5    79        5             0
#7  1995          453              5    91        5             0
#8  1996          453              5    91        5             0
#9  1997          391              5    78        5             0
#10 1998          454              5    91        5             0
#11 1999          451              5    90        5             0
#12 2000          444              5    89        5             0
#13 2001          449              5    90        5             0
#14 2002          458              5    92        5             0
#15 2003          454              5    91        5             0