Question

当我使用reshape2时，我能够获得一个包含TRUE和FALSE列的数据框。当我尝试使用dplyr时出现问题计算TRUE值的比例，因为TRUE是指逻辑值变为1，而不是名为TRUE的列。

解决此问题的自然方法是什么？

require(reshape2)
require(plyr)
require(dplyr)

transplants <- data.frame(donor_region = c(1, 1, 1, 2, 2, 2),
    recipient_region = c(1, 1, 2, 1, 2, 2)) %>%
    mutate(is_self = donor_region == recipient_region)

x <- ddply(transplants, .(donor_region, is_self), summarise,
    freq = length(is_self))
x %>% print

# Compute the proportion of transplants with is_self == TRUE
y <- dcast(x, donor_region ~ is_self, value.var = 'freq') %>%
    mutate(true_proportion = TRUE / (FALSE + TRUE))
y %>% print

# What I get:
#   donor_region FALSE TRUE true_proportion
# 1            1     1    2               1
# 2            2     1    2               1

# What I want to get:
#   donor_region FALSE TRUE true_proportion
# 1            1     1    2       0.6666667
# 2            2     1    2       0.6666667

Answer 1

我已将@thelatemail和@jenesaisquoi评论中给出的答案结合起来，因为评论部分不是保留答案的最佳位置。

使用反引号（@thelatemail）：

dcast(x, donor_region ~ is_self, value.var = 'freq') %>%
  mutate(true_proportion = `TRUE` / (`FALSE` + `TRUE`))

使用加权平均值（@jenesaisquoi）：

x %>% group_by(donor_region) %>% summarise(tp = weighted.mean(is_self, freq))

dplyr mutate不适用于列名“TRUE”和“FALSE”

1 个答案: