过滤或子集后,我没有得到正确的结果

时间:2018-03-10 18:51:28

标签: r google-analytics

我在数据框中过滤或更新日期后遇到问题。

我使用googleAnalyticsR软件包从GA API中提取Adwords数据费用(adCost)。

adCost与GA本身匹配。但在按月过滤子集后,我会得到不同的结果。

这是一个普通的API CALL,用于返回包含2列的数据框:date和adCost。这样的结果与GA平台中的数据相匹配。

start_date <- "2018-01-01"

final_date <- "2018-02-28"


data <- google_analytics(view_id,
                         date_range = c(start_date, final_date),
                         metrics = c("adCost"),
                         dimensions = c("date"),
                         anti_sample = TRUE)

和的预期输出(数据$ adCost):

20632.19

结果:

20632.19

但如果我将数据分组或过滤仅一个月(例如2月),我就无法获得正确的结果,因为GA在平台上显示。

data_feb <- data %>%
            filter(date >= "2018-02-01", date <= "2018-02-28")
            #subset(date >= "2018-02-01", date <= "2018-02-28") gives same incorrect result

和的预期输出(data_feb $ adCost):

10703.57

返回:

10537.1

我甚至尝试使用months()来获取新列中的月份并按月份名称进行过滤,但结果再次没有匹配。

data$month <- months(data$date, abbreviate = T)

data_feb <- data %>%
            filter(month == "feb")

和的预期输出(data_feb $ adCost):

10703.57

返回:

10537.1

它可能是什么?

数据:

data <- structure(list(date = structure(c(17532, 17533, 17534, 17535, 
    17536, 17537, 17538, 17539, 17540, 17541, 17542, 17543, 17544, 
    17545, 17546, 17547, 17548, 17549, 17550, 17551, 17552, 17553, 
    17554, 17555, 17556, 17557, 17558, 17559, 17560, 17561, 17562, 
    17563, 17564, 17565, 17566, 17567, 17568, 17569, 17570, 17571, 
    17572, 17573, 17574, 17575, 17576, 17577, 17578, 17579, 17580, 
    17581, 17582, 17583, 17584, 17585, 17586, 17587, 17588, 17589, 
    17590), class = "Date"), adCost = c(0, 0, 212.788901, 201.660582, 
    677.926913, 526.440256, 522.998839, 135.469596, 234.080656, 173.389505, 
    299.499735, 234.691749, 235.785283, 534.545275, 19.136849, 290.011717, 
    545.737919, 730.416558, 550.047731, 508.84722, 246.463323, 315.741935, 
    310.338589, 417.858737, 312.525658, 4.953066, 189.020612, 724.337794, 
    65.547729, 199.248374, 675.579031, 374.50332, 429.758963, 624.922665, 
    137.785316, 238.551281, 471.924357, 353.758332, 176.251992, 355.109168, 
    0, 0, 178.406897, 491.44716, 540.624039, 601.797631, 543.518688, 
    254.214552, 264.345825, 240.127257, 781.458877, 704.10741, 650.427743, 
    355.109168, 181.719663, 178.246083, 356.202702, 501.385456, 551.398567
    )), .Names = c("date", "adCost"), row.names = c(NA, 59L), class = "data.frame", totals = list(
        structure(list(adCost = "20632.193244"), .Names = "adCost")), minimums = list(
        structure(list(adCost = "0.0"), .Names = "adCost")), maximums = list(
        structure(list(adCost = "781.458877"), .Names = "adCost")), isDataGolden = TRUE, rowCount = 59L)

2 个答案:

答案 0 :(得分:0)

看起来你的方法是正确的。但sum(data_feb $ adCost)的预期输出应为10537.1。您可以使用以下内容快速验证它。

data_feb <- data %>%
  filter(date >= "2018-02-01", date <= "2018-02-28")

data_Jan <- data %>%
  filter(date < "2018-02-01")

[1] 10095.09

sum(data_Jan$adCost) + sum(data_feb$adCost)

[1] 20632.19 

答案 1 :(得分:0)

它就在那里说:

  

start_date <- "2018-01-01"

您的数据集的开始日期是一个月前。这是一张支票。

data %>%
  group_by(date >= "2018-02-01") %>%
  summarise(sum = sum(adCost))

# # A tibble: 2 x 2
#   `date >= "2018-02-01"`    sum
#   <lgl>                   <dbl>
# 1 FALSE                  10095.
# 2 TRUE                   10537.