如何优化非递归展平算法

时间:2018-05-30 17:51:48

标签: algorithm optimization google-apps-script large-data flatten

我正在使用Google Apps脚本环境为Google的Data Studio创建社区连接器。这样做的想法是调用外部API并以这样的方式解析数据,使Data Studio可以正确地创建图表或表格。

我目前正在尝试降低加载时​​间,而且我发现消耗的最大时间是改变API响应... ...

{
  campaigns: [{
    name_campaign: Example_Name,
    ads: [
      {cost: 15.98}
    ],
    budget: 237.59
  },
  {
    name_campaign: Another_Name,
    ads: [
      {cost: 98.25},
      {cost: 13.03},
      {cost: 23.44}
    ],
    budget: 15.50
  },
  ...
}

指望Data Studio所期望的方式,这是一个平面数据集......

[
  [237.59, Example_Name, 15.98],
  [15.50, Another_Name, 98.25],
  [15.50, Another_Name, 13.03],
  [15.50, Another_Name, 23.44]
]

诀窍在于,由于动态创建对API的调用,因此无法知道将有多少个字段或多少个不同的对象。有可能只有一个对象有两个字段或类似于上面的东西。幸运的是,它仅限于父母/子女关系而不是兄弟姐妹。

到目前为止,算法是一个非递归循环,它找到最深的对象并保留最后一个对象/循环中的所有字段。如果有更深的对象,则重复此过程。如果不是,那么它将获取最低对象的字段和所有先前存储的字段,将它们连接在一起,并将它们作为要返回的行附加。代码也会用逻辑进行评论,希望对此有所帮助。

function stackIt(data) {
  var totalData = [];

  //create stack with highest object
  var stack = [{
    v: data,
    parent_fields: []
  }];

  var stackLen = stack.length;
  var pushing = 0, totalFields = 0;
  var data_fields, array_field, cl, v, current_fields, temp, arr_len, row, parentFields;

  while (stackLen > 0) {
    //store current node 
    cl = stack.pop();
    if (cl === undefined)
      break;

    v = cl.v;
    parentFields = cl.parent_fields.slice(0);

    //fill new array with all parents
    data_fields = parentFields.slice(0);
    array_field = null;
    current_fields = [];

    //Does another object exist?
    //Keep track of all current fields on the chance that no objects exists below
    for (var field in v) {
      //Hold the current field
      temp = v[field];
      current_fields.push(temp);

      //found an object. So we know we need to move deeper with the current object
      if (typeof(temp) === 'object' && temp != null && array_field == null)
        array_field = field;
      //store current parent fields
      if (typeof(temp) !== "object")
        data_fields.push(temp)
    }

    //Push new node to stack to delve deeper
    //each one with with parent data points to tack on later for deepest level objects
    if (array_field != null) {
      for (var i = 0, arr_len = v[array_field].length; i < arr_len; i++) {
        //Skip broken fields
        if ('errors' in v[array_field][i])
          continue;

        stack.push({
          v: v[array_field][i],
          parent_fields: data_fields
        });
      }      
    }
    //No object exists below
    else {
      row = [];
      //re set data fields if no object was found
      //data_fields is changed in above function on the chance that there is an object below
      data_fields = parentFields.slice(0);

      //get total number of fields that should exist
      //this is to prevent pushing rows with only a subset of fields
      //only do once at deepest object to get total number of fields expected
      if (pushing == 0) {
        pushing = 1;
        totalFields = data_fields.length + current_fields.length;
      }

      //push parents fields (ex. camp.cost)
      row = data_fields.splice(0);

      //push all current fields held (ex. keyword.cost)
      row = row.concat(current_fields);
      //comes out to camp.cost, keyword.cost

      //Push all rows at lowest level; exit if done.
      if (row.length != totalFields) {
        console.log("End Stack"); 
        return totalData;
      }
      else
        totalData.push(row);
    }
  }

  console.log("End Stack with empty stack"); 
  return totalData;
}

我尝试过使用开源库cFlatten,但这使得展平变得慢得多,我发现的大多数工具只需要1-2个深度的对象,并且需要特定的格式化。该算法过去是递归的,但由于返回的数据量太大,因此速度要慢得多。关于如何使功能更快/更有效的任何想法将非常感激。

0 个答案:

没有答案