我在数据框中过滤或更新日期后遇到问题。
我使用googleAnalyticsR软件包从GA API中提取Adwords数据费用(adCost)。
adCost与GA本身匹配。但在按月过滤子集后,我会得到不同的结果。
这是一个普通的API CALL,用于返回包含2列的数据框:date和adCost。这样的结果与GA平台中的数据相匹配。
start_date <- "2018-01-01"
final_date <- "2018-02-28"
data <- google_analytics(view_id,
date_range = c(start_date, final_date),
metrics = c("adCost"),
dimensions = c("date"),
anti_sample = TRUE)
和的预期输出(数据$ adCost):
20632.19
结果:
20632.19
但如果我将数据分组或过滤仅一个月(例如2月),我就无法获得正确的结果,因为GA在平台上显示。
data_feb <- data %>%
filter(date >= "2018-02-01", date <= "2018-02-28")
#subset(date >= "2018-02-01", date <= "2018-02-28") gives same incorrect result
和的预期输出(data_feb $ adCost):
10703.57
返回:
10537.1
我甚至尝试使用months()来获取新列中的月份并按月份名称进行过滤,但结果再次没有匹配。
data$month <- months(data$date, abbreviate = T)
data_feb <- data %>%
filter(month == "feb")
和的预期输出(data_feb $ adCost):
10703.57
返回:
10537.1
它可能是什么?
数据:
data <- structure(list(date = structure(c(17532, 17533, 17534, 17535,
17536, 17537, 17538, 17539, 17540, 17541, 17542, 17543, 17544,
17545, 17546, 17547, 17548, 17549, 17550, 17551, 17552, 17553,
17554, 17555, 17556, 17557, 17558, 17559, 17560, 17561, 17562,
17563, 17564, 17565, 17566, 17567, 17568, 17569, 17570, 17571,
17572, 17573, 17574, 17575, 17576, 17577, 17578, 17579, 17580,
17581, 17582, 17583, 17584, 17585, 17586, 17587, 17588, 17589,
17590), class = "Date"), adCost = c(0, 0, 212.788901, 201.660582,
677.926913, 526.440256, 522.998839, 135.469596, 234.080656, 173.389505,
299.499735, 234.691749, 235.785283, 534.545275, 19.136849, 290.011717,
545.737919, 730.416558, 550.047731, 508.84722, 246.463323, 315.741935,
310.338589, 417.858737, 312.525658, 4.953066, 189.020612, 724.337794,
65.547729, 199.248374, 675.579031, 374.50332, 429.758963, 624.922665,
137.785316, 238.551281, 471.924357, 353.758332, 176.251992, 355.109168,
0, 0, 178.406897, 491.44716, 540.624039, 601.797631, 543.518688,
254.214552, 264.345825, 240.127257, 781.458877, 704.10741, 650.427743,
355.109168, 181.719663, 178.246083, 356.202702, 501.385456, 551.398567
)), .Names = c("date", "adCost"), row.names = c(NA, 59L), class = "data.frame", totals = list(
structure(list(adCost = "20632.193244"), .Names = "adCost")), minimums = list(
structure(list(adCost = "0.0"), .Names = "adCost")), maximums = list(
structure(list(adCost = "781.458877"), .Names = "adCost")), isDataGolden = TRUE, rowCount = 59L)
答案 0 :(得分:0)
看起来你的方法是正确的。但sum(data_feb $ adCost)的预期输出应为10537.1。您可以使用以下内容快速验证它。
data_feb <- data %>%
filter(date >= "2018-02-01", date <= "2018-02-28")
data_Jan <- data %>%
filter(date < "2018-02-01")
[1] 10095.09
sum(data_Jan$adCost) + sum(data_feb$adCost)
[1] 20632.19
答案 1 :(得分:0)
它就在那里说:
start_date <- "2018-01-01"
您的数据集的开始日期是一个月前。这是一张支票。
data %>%
group_by(date >= "2018-02-01") %>%
summarise(sum = sum(adCost))
# # A tibble: 2 x 2
# `date >= "2018-02-01"` sum
# <lgl> <dbl>
# 1 FALSE 10095.
# 2 TRUE 10537.