我正在构建复制文件的摘要表,并且在使用ddply完成任务时遇到了问题。
我需要按年份排序,然后计算每年有多少个县(fips)的虚拟变量== 1(RVPI == 1)。
这是我数据的一小部分: https://www.dropbox.com/s/saem1xasai9t1zv/data2.csv?dl=0
这是我到目前为止的代码:
请注意,RVPI只计算当年的所有观察结果而不计算县。
感谢您提供的任何帮助。
data <- read.csv("/.../data2.csv")
ddply(data, .(year), summarize,
Observations = length(year),
Total_Monitors = length(unique(indivID)),
Urban = round(length(urban == 1) / length(unique(fips))),
Counties = length(unique(fips)),
RVPI_Counties = sum(RVPI == 1)) #Reports total and doesn't account for counties(fips)
CUrrent输出:
year Observations Total_Monitors Urban Counties RVPI_Counties
1 1989 147 2 74 2 147
2 1990 209 4 52 4 209
3 1991 554 7 79 7 554
4 1992 389 5 78 5 0
5 1993 422 5 84 5 0
6 1994 397 5 79 5 0
7 1995 453 5 91 5 0
8 1996 453 5 91 5 0
9 1997 391 5 78 5 0
10 1998 454 5 91 5 0
11 1999 451 5 90 5 0
12 2000 444 5 89 5 0
13 2001 449 5 90 5 0
14 2002 458 5 92 5 0
15 2003 454 5 91 5 0
答案 0 :(得分:2)
如果你不是ddply
,那么这是一个data.table解决方案。
data <- read.csv("data2.csv")
library(data.table)
setDT(data)[,list(Observations=.N,
Total_Monitors=length(unique(indivID)),
Urban =round(sum(urban==1)/length(unique(fips))),
Counties =length(unique(fips)),
RVPI_Counties =length(unique(fips[RVPI==1]))),
by=year]
# year Observations Total_Monitors Urban Counties RVPI_Counties
# 1: 1991 554 7 25 7 7
# 2: 1992 389 5 17 5 0
# 3: 1993 422 5 18 5 0
# 4: 1994 397 5 18 5 0
# 5: 1995 453 5 18 5 0
# 6: 1996 453 5 18 5 0
# 7: 1997 391 5 18 5 0
# 8: 1998 454 5 18 5 0
# 9: 1999 451 5 17 5 0
# 10: 2000 444 5 18 5 0
# 11: 2001 449 5 18 5 0
# 12: 2002 458 5 18 5 0
# 13: 2003 454 5 18 5 0
# 14: 1990 209 4 23 4 4
# 15: 1989 147 2 46 2 2
答案 1 :(得分:1)
我还会将dplyr
用于更大的数据集(类似于@ jlhoward&#39; s)
data <- read.csv('data2.csv')
library(dplyr)
data %>%
group_by(year) %>%
summarise(Observations=n(),
Total_Monitors=n_distinct(indivID),#n_distinct contributed by @beginneR
Urban=round(length(urban==1)/n_distinct(fips)),
Counties=n_distinct(fips),
RVPI_Counties=length(unique(fips[RVPI==1])))
# year Observations Total_Monitors Urban Counties RVPI_Counties
#1 1989 147 2 74 2 2
#2 1990 209 4 52 4 4
#3 1991 554 7 79 7 7
#4 1992 389 5 78 5 0
#5 1993 422 5 84 5 0
#6 1994 397 5 79 5 0
#7 1995 453 5 91 5 0
#8 1996 453 5 91 5 0
#9 1997 391 5 78 5 0
#10 1998 454 5 91 5 0
#11 1999 451 5 90 5 0
#12 2000 444 5 89 5 0
#13 2001 449 5 90 5 0
#14 2002 458 5 92 5 0
#15 2003 454 5 91 5 0