使用R / rga从GA获取完整的非抽样数据

时间:2015-04-02 10:49:08

标签: r google-analytics google-analytics-api

我正在使用skardhamar的rga ga $ getData来查询GA并以非抽样的方式获取所有数据。这些数据基于每天超过500,000个会话。

https://github.com/skardhamar/rga,段'提取超过10,000的观察'提到这可以通过使用batch = TRUE来实现。另外,段落“获取数据未经采样”提到,通过走过几天,您可以获得非抽样数据。我正在尝试将这两者结合起来,但我无法让它发挥作用。 E.g。

ga$getData(xxx,
    start.date = "2015-03-30", 
    end.date = "2015-03-31",
    metrics = "ga:totalEvents", 
    dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
    sort = "", 
    filters = "", 
    segment = "",
    ,batch = TRUE, walk = TRUE
    )

..确实获得了非抽样数据,但并非所有数据。我得到的数据帧只有20k行(每天10k)。由于使用batch = TRUE设置,这限制了每天10k的块,与我的预期相反。因此,对于3月30日,在看到此输出后,我得到了一个20k行的数据帧:

Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations

当我忽略walk = TRUE设置时,我会得到所有观察结果(771k行,每天约335k),但只能采样:

ga$getData(xxx,
   start.date = "2015-03-30", 
   end.date = "2015-03-31",
   metrics = "ga:totalEvents", 
   dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
   sort = "", 
   filters = "", 
   segment = "",
   ,batch = TRUE
   )

Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...

我的数据太大了,无法对所有观察结果进行非抽样检测吗?

1 个答案:

答案 0 :(得分:0)

您可以尝试使用 filters =“ga:deviceCategory == desktop”(以及分别 filters =“ga:deviceCategory!= desktop”)查询设备合并生成的数据帧。

我假设您的用户使用不同的设备访问您的网站。根本的逻辑是,当您过滤数据时,Google Analytics服务器会在 之前对其进行过滤,因此您可以“划分”您的查询并获取非抽样数据。我认为与“行走”功能的方法论相同。

仅限桌面

ga$getData(xxx,
start.date = "2015-03-30", 
end.date = "2015-03-31",
metrics = "ga:totalEvents", 
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
sort = "", 
filters = "ga:deviceCategory==desktop", 
segment = "",
,batch = TRUE, walk = TRUE
)

移动设备和平板电脑

ga$getData(xxx,
start.date = "2015-03-30", 
end.date = "2015-03-31",
metrics = "ga:totalEvents", 
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
sort = "", 
filters = "ga:deviceCategory!=desktop", 
segment = "",
,batch = TRUE, walk = TRUE
)