从Google Analytics API(python)提取数据时不匹配

时间:2019-01-04 17:00:37

标签: python google-analytics google-analytics-api segment

我正在编写一个脚本来从Google Analytics API v4中提取数据。该脚本工作正常。但是,在通过比较GA和提取的数据来验证数据时,我会看到一些差异。没什么不同,但我不明白为什么不一样。

仅需提及的是,我在脚本中使用了动态细分,其条件与我在GA视图中拥有的细分完全相同。 该部分仅通过过滤会话持续时间大于1秒的流量来过滤垃圾邮件流量。

这是我要拉的结构:

body={
"reportRequests":[
{
"viewId": view_id,
"dimensions":[{"name": "ga:date"},{"name": "ga:sourceMedium"},{"name": "ga:campaign"},{"name": "ga:adContent"},{"name": "ga:channelGrouping"},{"name": "ga:segment"}],
"dateRanges":[
{
"startDate":"2018-12-16",
"endDate":"2018-12-20"
}],
"metrics":[{"expression":"ga:sessions","alias":"sessions"}],
"segments":[
{
"dynamicSegment":
{
    "name": "sessions_no_spam",
    "userSegment":
    {
    "segmentFilters":[
    {
        "simpleSegment":
        {
        "orFiltersForSegment":
        {
            "segmentFilterClauses": [
            {
            "metricFilter":
            {
                "metricName":"ga:sessionDuration",
                "operator":"GREATER_THAN",
                "comparisonValue":"1"
            }
            }]
        }
        }
    }]
    }
}
}]
}]
}).execute()

不确定我的问题的答案是否将是概念性的而非技术性的,但以防万一,我还包括将结果批量存储在数据库中的功能:

def print_results(no_spam_traffic):
    connection = psycopg2.connect(database = 'web_insights_data', user = 'XXXX', password = 'XXXXX', host = 'XXX', port = 'XXXXX')
    cursor = connection.cursor()
    for report in no_spam_traffic.get('reports', []):
        for row in report.get('data', {}).get('rows', []):
            gadate = row['dimensions'][0]
            gadate = gadate[0:4]+'/'+gadate[4:6]+'/'+gadate[6:8]
            gasourcemedium = row['dimensions'][1]
            gacampaign = row['dimensions'][2]
            gaadcontent = row['dimensions'][3]
            gachannel = row['dimensions'][4]
            gasessions = row['metrics'][0]['values'][0]

            cursor.execute("SELECT * from GA_no_spam_traffic where gadate = %s AND sourcemedium = %s AND campaign = %s AND adcontent = %s", (str(gadate),str(gasourcemedium),str(gacampaign),str(gaadcontent)))
            if len(cursor.fetchall())>0:        #update old entries
                cursor.execute("UPDATE GA_no_spam_traffic set sessions = %s where gadate = %s AND sourcemedium = %s AND campaign = %s AND adcontent = %s", (str(gasessions),str(gadate),str(gasourcemedium),str(gacampaign),str(gaadcontent)))
                connection.commit()
            else:                               #Insert new rows
                cursor.execute("INSERT INTO GA_no_spam_traffic (gadate,sourcemedium,campaign,adcontent,channel,sessions) VALUES (%s,%s,%s,%s,%s,%s)", (gadate,gasourcemedium,gacampaign,gaadcontent,gachannel,gasessions))
                connection.commit()

    connection.close()

任何想法可能是什么问题? 谢谢!

1 个答案:

答案 0 :(得分:0)

尽管不完全正确,我还是设法改进了它。但是,这是可以接受的差异。页面大小有问题,因此增加了pagesize参数。

以下是Google指南中指向分页部分的链接:https://developers.google.com/analytics/devguides/reporting/core/v4/migration#pagination 谢谢