我正在尝试将以前在Excel中完成的一些工作转移到R中。我所需要做的就是将两个基本count_if公式转换为可读的R脚本。在Excel中,我将使用三个表并使用“点击”方法在这些表之间进行计算,但是现在我迷失了在R中处理该表的方式。
我的原始数据框很大,因此针对这个问题,我发布了示例数据框:
OperatorData <- data.frame(
Operator = c("A","B","C"),
Locations = c(850, 575, 2175)
)
AreaData <- data.frame(
Area = c("Torbay","Torquay","Tooting","Torrington","Taunton","Torpley"),
SumLocations = c(1000,500,500,250,600,750)
)
OperatorAreaData <- data.frame(
Operator = c("A","A","A","B","B","B","C","C","C","C","C"),
Area = c("Torbay","Tooting","Taunton",
"Torbay","Taunton","Torrington",
"Tooting","Torpley","Torquay","Torbay","Torrington"),
Locations = c(250,400,200,
100,400,75,
100,750,500,650,175)
)
我想做的是在OperatorData数据框中添加两个新列:一个指示操作员在其中操作的区域数,另一个指示操作员在和拥有超过50%的位置。
因此,新生成的数据框将如下所示
Operator Locations AreaCount Own_GE_50percent
A 850 3 1
B 575 3 1
C 2715 5 4
到目前为止,我已经设法使用表函数来计算第一列,然后追加:
OpAreaCount <- data.frame(table(OperatorAreaData$Operator))
names(OpAreaCount)[2] <- "AreaCount"
OperatorData$"AreaCount" <- cbind(OpAreaCount$AreaCount)
这非常简单,但是我陷入了如何在50%的条件下计算第二列计算的问题。
答案 0 :(得分:1)
library(dplyr)
OperatorAreaData %>%
inner_join(AreaData, by="Area") %>%
group_by(Operator) %>%
summarise(AreaCount = n_distinct(Area),
Own_GE_50percent = sum(Locations > (SumLocations/2)))
# # A tibble: 3 x 3
# Operator AreaCount Own_GE_50percent
# <fct> <int> <int>
# 1 A 3 1
# 2 B 3 1
# 3 C 5 4
如果您确定每个AreaCount = n()
都有唯一的Area
值,则可以使用Operator
。