我正在使用skardhamar的rga ga $ getData来查询GA并以非抽样的方式获取所有数据。这些数据基于每天超过500,000个会话。
在https://github.com/skardhamar/rga,段'提取超过10,000的观察'提到这可以通过使用batch = TRUE来实现。另外,段落“获取数据未经采样”提到,通过走过几天,您可以获得非抽样数据。我正在尝试将这两者结合起来,但我无法让它发挥作用。 E.g。
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE, walk = TRUE
)
..确实获得了非抽样数据,但并非所有数据。我得到的数据帧只有20k行(每天10k)。由于使用batch = TRUE设置,这限制了每天10k的块,与我的预期相反。因此,对于3月30日,在看到此输出后,我得到了一个20k行的数据帧:
Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
当我忽略walk = TRUE设置时,我会得到所有观察结果(771k行,每天约335k),但只能采样:
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE
)
Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...
我的数据太大了,无法对所有观察结果进行非抽样检测吗?
答案 0 :(得分:0)
您可以尝试使用 filters =“ga:deviceCategory == desktop”(以及分别 filters =“ga:deviceCategory!= desktop”)查询设备合并生成的数据帧。
我假设您的用户使用不同的设备访问您的网站。根本的逻辑是,当您过滤数据时,Google Analytics服务器会在 之前对其进行过滤,因此您可以“划分”您的查询并获取非抽样数据。我认为与“行走”功能的方法论相同。
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory==desktop",
segment = "",
,batch = TRUE, walk = TRUE
)
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory!=desktop",
segment = "",
,batch = TRUE, walk = TRUE
)