我有相当多的网址(> 8.500)我想使用R.查询Google AnalyticsAPI。我正在使用googleAnalyticsR包。问题是,我确实能够遍历我的网址集,但创建的数据框只返回每行的host-id的总值(例如,每行的值相同)。
在这里,我达到了这一点:
library(googleAnalyticsR)
library(lubridate)
#Authorize with google
ga_auth()
ga.acc.list = ga_account_list()
my.id = 123456
#set time range
soty = floor_date(Sys.Date(), "year")
yesterday = floor_date(Sys.Date(), "day") - days(1)
#get some - in this case - random URLs
urls = c("example.com/de/", "example.com/us/", "example.com/en/")
urls = gsub("^example.com/", "ga:pagePath=~", urls)
df = data.frame()
#get data
for(i in urls){
ga.data = google_analytics_4(my.id,
date_range = c(soty, yesterday),
metrics = c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate"),
filters = urls[i])
df = rbind(df, ga.data)}
结果是始终在创建的数据帧的每一行中接收my.id-domain的总统计信息(自己的数据):
任何人都知道如何解决这个问题的更好方法,或谷歌分析只是阻止我们以这种方式查询它?
答案 0 :(得分:1)
您获得的是正常的:您只查询了metrics
(c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate")
),因此您只能获得指标。
如果您想细分这些指标,则需要使用dimensions
:
https://developers.google.com/analytics/devguides/reporting/core/dimsmets
在您的情况下,您对ga:pagePath
维度感兴趣,所以这样的事情(未经测试的代码):
ga.data = google_analytics_4(my.id,
date_range = c(soty, yesterday),
dimensions=c("pagePath"),
metrics = c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate"),
filters = urls[i])
我建议您使用Google Analytics Query Explorer
,直到获得所需的结果,然后将其移至R。
至于结果数量,默认情况下可能会限制为1K,直到您增加max_rows
为止。 There is a hard limit on 10K from the API,这意味着如果需要,您必须使用分页来检索更多结果。我在R文档中看到一些例子,其中max = 99999999,我不知道R库是否会自动处理超过前10K的分页,或者他们是否不知道硬限制:
batch_gadata <- google_analytics(id = ga_id,
start="2014-08-01", end="2015-08-02",
metrics = c("sessions", "bounceRate"),
dimensions = c("source", "medium",
"landingPagePath",
"hour","minute"),
max=99999999)