我有一个csv文件,如下所示:
Year, Answer, Total
2017, Yes, 100
2017, No, 10
2017, Yes, 100
2018, No, 40
2018, Yes, 200
我正在尝试创建一个列,用于计算给定年份中“否”与“是”之间的比率。所以看起来像这样:
Year, Answer, Total, Ratio
2017, Yes, 100, 1
2017, No, 10, 0.05
2017, Yes, 100, 1
2018, No, 40, 0.2
2018, Yes, 200, 1
我正在使用R和dplyr。我想我必须创建一个列,其中包含给定年份中“是”的总数(将有重复项)。然后使用ifelse语句创建另一列,其中“是”行将为1,而“否”行将为总NO数除以“是”总数。有没有更有效的方法可以做到这一点?谢谢
答案 0 :(得分:2)
怎么样?
library(dplyr)
xdf <- data.frame(
stringsAsFactors = FALSE,
Year = c(2017, 2017, 2017, 2018, 2018),
Answer = c("Yes", "No", "Yes", "No", "Yes"),
Total = c(100, 10, 100, 40, 200)
)
xdf %>%
group_by(Year, Answer) %>%
summarise(Total = sum(Total)) %>%
mutate(share = if_else(Answer == "No", Total/lead(Total), 1))
#> # A tibble: 4 x 4
#> # Groups: Year [2]
#> Year Answer Total share
#> <dbl> <chr> <dbl> <dbl>
#> 1 2017 No 10 0.05
#> 2 2017 Yes 200 1
#> 3 2018 No 40 0.2
#> 4 2018 Yes 200 1
答案 1 :(得分:0)
这是一种使用自定义功能的方法
# function calculating the ratios
f1 <- function(k){
ind.yes <- intersect(which(df$year == df$year[k]),
which(df$answer == "yes")
)
ind.no <- intersect(which(df$year == df$year[k]),
which(df$answer == "no")
)
total.yes <- sum(df$total[ind.yes])
total.no <- sum(df$total[ind.no])
ratio.no.yes <- total.no/total.yes
return(ratio.no.yes)
}
# vapplying function f1
ratios <- vapply(1:nrow(df), f1, numeric(1))
# binding the data
df$ratios <- ratios
这是结果(使用虚拟数据帧)
df <- data.frame(
year = sample(2015:2018, 10, replace = T),
answer = sample(c("yes", "no"), 10, replace = T),
total = sample(10:200, 10, replace = T),
stringsAsFactors = F)
ratios <- vapply(1:nrow(df), f1, numeric(1))
df$ratios <- ratios
# printing
> df
year answer total ratios
1 2015 yes 76 0.08294931
2 2017 yes 43 2.55263158
3 2018 yes 63 0.00000000
4 2016 yes 61 0.83606557
5 2015 no 18 0.08294931
6 2017 no 142 2.55263158
7 2017 yes 33 2.55263158
8 2015 yes 141 0.08294931
9 2016 no 51 0.83606557
10 2017 no 52 2.55263158
答案 2 :(得分:0)
我认为效率对此并不重要。您可以将它设为单线,尽管很难阅读:
DF %>% group_by(Year) %>% mutate(v =
(Total / sum(Total[Answer == "Yes"]))^(Answer == "No")
)
当答案!=“否”时,此x^cond
使用x ^ FALSE = x ^ 0 = 1分配所需的值1。