跨R中两个以上不同数据帧的计算

时间:2018-10-11 09:48:07

标签: r cbind

我正在尝试将以前在Excel中完成的一些工作转移到R中。我所需要做的就是将两个基本count_if公式转换为可读的R脚本。在Excel中,我将使用三个表并使用“点击”方法在这些表之间进行计算,但是现在我迷失了在R中处理该表的方式。

我的原始数据框很大,因此针对这个问题,我发布了示例数据框:

OperatorData <- data.frame(
                    Operator = c("A","B","C"),
                    Locations = c(850, 575, 2175)
 )

AreaData <- data.frame(
              Area = c("Torbay","Torquay","Tooting","Torrington","Taunton","Torpley"),
              SumLocations = c(1000,500,500,250,600,750)
 )

OperatorAreaData <- data.frame(
              Operator = c("A","A","A","B","B","B","C","C","C","C","C"),
              Area = c("Torbay","Tooting","Taunton",
                       "Torbay","Taunton","Torrington",
                       "Tooting","Torpley","Torquay","Torbay","Torrington"),
              Locations = c(250,400,200,
                            100,400,75,
                            100,750,500,650,175)
 )

我想做的是在OperatorData数据框中添加两个新列:一个指示操作员在其中操作的区域数,另一个指示操作员在和拥有超过50%的位置。

因此,新生成的数据框将如下所示

Operator     Locations   AreaCount    Own_GE_50percent
A            850         3            1
B            575         3            1
C            2715        5            4

到目前为止,我已经设法使用表函数来计算第一列,然后追加:

OpAreaCount <- data.frame(table(OperatorAreaData$Operator))
names(OpAreaCount)[2] <- "AreaCount"
OperatorData$"AreaCount" <- cbind(OpAreaCount$AreaCount)

这非常简单,但是我陷入了如何在50%的条件下计算第二列计算的问题。

1 个答案:

答案 0 :(得分:1)

library(dplyr)

OperatorAreaData %>%
  inner_join(AreaData, by="Area") %>%
  group_by(Operator) %>%
  summarise(AreaCount = n_distinct(Area),
            Own_GE_50percent = sum(Locations > (SumLocations/2)))

# # A tibble: 3 x 3
#   Operator AreaCount Own_GE_50percent
#   <fct>        <int>            <int>
# 1 A                3                1
# 2 B                3                1
# 3 C                5                4

如果您确定每个AreaCount = n()都有唯一的Area值,则可以使用Operator