这更像是我现在正在做的代码清理练习。我的初始数据是这样的:
Year County Town ... Funding Received ... (90+ Variables total)
2016 a x Yes
2015 a y No
2014 a x Yes
2016 b z Yes
我无法看到如何从中获取提交和批准的应用程序数,因此我将其转换为指标变量,并使用以下代码计算:
counties <- original_data %>%
select(county, funded, year) %>%
mutate(
a=ifelse(county == "a", 1,0),
b=ifelse(county == "b", 1,0),
c=ifelse(county == "c", 1,0),
... etc ...
)
输出看起来像
County Funding Received Year binary.a binary.b
a Yes 2016 1 0
a No 2015 1 0
b No 2016 0 1
然后将这些数据转换为两个数据框(提交和资助),以使用以下代码计算每个县每年提交和资助的申请:
countysum <- counties %>%
select(-funded) %>%
group_by(county, year) %>%
summarise_all(sum, na.rm = T)
输出如下:
County Year sum.a sum.b
a 2016 32 0
a 2015 24 0
b 2016 0 16
但是为了以更整洁的格式获取数据,我又使用了一些命令:
countysum$submitted <- rowSums(countysum[,3:15, na.rm = T) #3:15 are county indicator vars
countysum <- countysum[,-c(3:19)]
现在我的问题是:有没有办法将所有这些行动减少到一个单一的管道?现在我有适用的代码,但更喜欢让代码运行起来并且更容易理解。抱歉缺乏数据,我无法分享。
答案 0 :(得分:0)
我不确定我是否完全理解你的最终所需输出是什么样的,但我认为你可以利用逻辑值被强制转换为整数并跳过创建虚拟列的事实。
library(dplyr)
byyear <- original_data %>%
group_by(county, year) %>%
summarize(
wasfunded = any(funded == "Yes", na.rm = T)
, submittedapplication = any(submittedapp == "Yes", na.rm = T) # I'm assuming did/didn't submit is one of the other variables
)
# if you don't need the byyear data for something else (I always seem to),
# you can pipe that straight into this next line
yrs_funded_by_county <- byyear %>%
summarize(
n_yrs_funded = sum(wasfunded)
, n_yrs_submitted = sum(submittedapplication)
, pct_awarded = n_yrs_funded/n_yrs_submitted # maybe you don't need a award rate, but I threw it it b/c it's the kind of stuff my grant person cares about
)