我们有一个系统可以查询GA API以获取大量令牌,主要用于网站访问和会话数据。
我们最近注意到,在查询API时我们得到了奇怪的结果 - 特别是我们从结果集中看到了丢失的记录。更具体地说,看起来当我们有几页行时,当开始下一页时,结果将会跳过"跳过"页面的开头。
此行为不一致 - 每个运行一组不同的站点/令牌会显示此错误,当我尝试手动调试代码时,我从未遇到过此行为。
起初我认为问题出在我们的代码上,可能是某种竞争条件或共享内存,但问题似乎是API访问本身 - 这是因为我检查了TotalResults属性与查询一起返回,当发生此错误时,它显示的总行数少于我手动查询时看到的数量。
例如,我们会查询包含日期和国家/地区维度的网站,并记录的行为:
domain | year | month | day | country | metrics
-----------------------------------------------
X.com 2017 09 22 IT ..... // metrics
// finished result page
X.com 2017 09 24 BW ..... // metrics
....
Total rows - 1295
当我们再次运行相同的代码时,我们获得了此站点2017-09-23值的行,以及总行数 - 1368
这是API中的错误吗?或者也许是我们访问它的方式?我还没有发现这样的问题。
编辑:我已添加了我们使用的API调用方法代码。
private GaDataFlat GetDataV3(string type, string profileID,
List<Metric> v4metrics, List<MetricFilterClause> v4metricFilters,
List<Dimension> v4dimensions, List<DimensionFilterClause> v4dimensionFilters,
List<OrderBy> v4sorting, DateTime start, DateTime end, int maxResults)
{
List<string> metrics = (v4metrics == null ? null : v4metrics.Select(x => x.Expression).ToList());
List<string> dimensions = (v4dimensions == null ? null : v4dimensions.Select(x => x.Name).ToList());
List<string> sorting = (v4sorting == null ? null : v4sorting.Select(x => x.FieldName).ToList());
List<string> filters = (v4dimensionFilters == null ? null : v4dimensionFilters.Select(x => deconstructFilter(x)).ToList());
return ExponentialBackoff.Go(() =>
{
var gaData = new GaDataFlat { DataTable = new DataTable() };
DataResource.GaResource.GetRequest request = service.Data.Ga.Get("ga:" + profileID,
start.ToString("yyyy-MM-dd"), end.ToString("yyyy-MM-dd"), String.Join(",", metrics));
//Set the user Quota to not have concurrent limitiation
request.QuotaUser = profileID + Thread.CurrentThread.ManagedThreadId;
if (dimensions != null)
{
request.Dimensions = string.Join(",", dimensions);
}
if (filters != null)
{
request.Filters = string.Join(";", filters);
}
if (sorting != null)
{
request.Sort = "-" + string.Join(";-", sorting);
}
request.SamplingLevel = DataResource.GaResource.GetRequest.SamplingLevelEnum.HIGHERPRECISION;
bool hasNext;
int rowCount = 0;
int iteration = 0;
do
{
iteration++;
MetricsProvider.Counter("ga.iteration", 1, "type:" + type);
if (iteration > 100)
{
string error = "Too many iterations ";
LogFacade.Fatal(error);
throw new Exception(error);
}
if (!counter.IncrementAndCheckAvailablility(Constants.APIS.GA))
{
Console.WriteLine("Daily Limit Exceeded - counter");
throw new QuotaExceededException();
}
GaData DataList = request.Execute();
gaData.SampleSize = DataList.SampleSize;
gaData.SampleSpace = DataList.SampleSpace;
if (DataList.Rows != null)
{
if (gaData.DataTable.Columns.Count == 0)
{
for (int j = 0; j < DataList.ColumnHeaders.Count; j++)
{
gaData.DataTable.Columns.Add(new DataColumn
{
ColumnName = DataList.ColumnHeaders[j].Name
});
}
}
foreach (var row in DataList.Rows.ToList())
{
var reportRow = new List<object>();
for (int j = 0; j < DataList.ColumnHeaders.Count; j++)
{
reportRow.Add(row[j]);
}
Console.WriteLine(string.Join(":", v4dimensionFilters.SelectMany(f => f.Filters.SelectMany(inner => inner.Expressions))) + "," +
string.Join(",", reportRow.Select(cell => cell.ToString())));
gaData.DataTable.Rows.Add(reportRow.ToArray());
}
rowCount += DataList.Rows.Count;
request.StartIndex = rowCount;
Console.WriteLine(string.Join(":", v4dimensionFilters.SelectMany(f => f.Filters.SelectMany(inner => inner.Expressions))) + ", next page starts " + request.StartIndex);
hasNext = rowCount < DataList.TotalResults;
}
else
{
hasNext = false;
}
} while (hasNext && (maxResults == 0 || rowCount < maxResults));
return gaData;
}, type, "GetData " + profileID + " " + Thread.CurrentThread.ManagedThreadId);
}
编辑:我们使用的过滤器是一致的 - 例如,我们希望获得网站x.com的桌面访问,过滤器将是:
ga:hostname=~x\.com(\/|)$;ga:deviceCategory==desktop