Question

列出要分组的人

const arr = [
  {
    "Global Id": "1231",
    "TypeID": "FD1",
    "Size": 160,
    "Flöde": 55,
  },
  {
    "Global Id": "5433",
    "TypeID": "FD1",
    "Size": 160,
    "Flöde": 100,
  },
  {
    "Global Id": "50433",
    "TypeID": "FD1",
    "Size": 120,
    "Flöde": 100,
  },
 {
    "Global Id": "452",
    "TypeID": "FD2",
    "Size": 120,
    "Flöde": 100,
  },
]

函数输入，指定要分组的键：

const columns = [
    {
      "dataField": "TypeID",
      "summarize": false,
    },
    {
      "dataField": "Size",
      "summarize": false,
    },
    {
      "dataField": "Flöde",
      "summarize": true,
    },
]

预期输出：

const output = [
    {
      "TypeID": "FD1",
      "Size": 160,
      "Flöde": 155 // 55 + 100
      "nrOfItems": 2
    },
    {
       "TypeID": "FD1",
       "Size": 120,
       "Flöde": 100,
       "nrOfItems": 1  
    },
    {
       "TypeID": "FD2",
       "Size": 120,
       "Flöde": 100,
       "nrOfItems": 1  
    }
  ]

  // nrOfItems adds up 4. 2 + 1 +1. The totalt nr of items.

功能：

const groupArr = (columns) => R.pipe(...);

"summarize"属性指示该属性是否应该汇总。

数据集非常大，超过100k个项目。因此，我不想重复过多的操作。

我看过R.group，但不确定在这里是否可以使用它？

也许与R.reduce有关系？将组存储在累加器中，汇总值并添加计数（如果该组已经存在）？需要快速找到组，以便将组存储为密钥吗？

还是在这种情况下使用香草javascript更好？

Answer 1

这是香草javascipt中的一个答案，因为我对Ramda API不太熟悉。我很确定该方法与Ramda十分相似。

代码中有注释，解释了每个步骤。我将尝试重写Ramda。

const arr=[{"Global Id":"1231",TypeID:"FD1",Size:160,"Flöde":55},{"Global Id":"5433",TypeID:"FD1",Size:160,"Flöde":100},{"Global Id":"50433",TypeID:"FD1",Size:120,"Flöde":100},{"Global Id":"452",TypeID:"FD2",Size:120,"Flöde":100}],columns=[{dataField:"TypeID",summarize:!1},{dataField:"Size",summarize:!1},{dataField:"Flöde",summarize:!0}];

// The columns that don't summarize
// give us the keys we need to group on
const groupKeys = columns
  .filter(c => c.summarize === false)
  .map(g => g.dataField);

// We compose a hash function that create
// a hash out of all the items' properties
// that are in our groupKeys
const groupHash = groupKeys
  .map(k => x => x[k])
  .reduce(
    (f, g) => x => `${f(x)}___${g(x)}`,
    () => "GROUPKEY"
  );

// The columns that summarize tell us which
// properties to sum for the items within the
// same group
const sumKeys = columns
  .filter(c => c.summarize === true)
  .map(c => c.dataField);
  
// Again, we compose in to a single function.
// This function concats two items, taking the
// "last" item with only applying the sum
// logic for keys in concatKeys
const concats = sumKeys
  .reduce(
    (f, k) => (a, b) => Object.assign(f(a, b), {
      [k]: (a[k] || 0) + b[k]
    }),
    (a, b) => Object.assign({}, a, b)
  )

// Now, we take our data and group by the groupHash
const groups = arr.reduce(
  (groups, x) => {
    const k = groupHash(x);
    if (!groups[k]) groups[k] = [x];
    else groups[k].push(x);
    return groups;
  },
  {}
);

// These are the keys we want our final objects to have...
const allKeys = ["nrTotal"]
  .concat(groupKeys)
  .concat(sumKeys);
  
// ...baked in to a helper to remove other keys
const cleanKeys = obj => Object.assign(
  ...allKeys.map(k => ({ [k]: obj[k] }))
);

// With the items neatly grouped, we can reduce each
// group using the composed concatenator
const items = Object
  .values(groups)
  .flatMap(
    xs => cleanKeys(
      xs.reduce(concats, { nrTotal: xs.length })
    ),
  );

console.log(items);

这是尝试移植到Ramda的尝试，但是除了用Ramda等效物替换vanilla js方法之外，我没有其他更多的事情。好奇地看到我错过了哪些很棒的实用程序和功能概念！我敢肯定有人会对Ramda的细节有更深的了解！

const arr=[{"Global Id":"1231",TypeID:"FD1",Size:160,"Flöde":55},{"Global Id":"5433",TypeID:"FD1",Size:160,"Flöde":100},{"Global Id":"50433",TypeID:"FD1",Size:120,"Flöde":100},{"Global Id":"452",TypeID:"FD2",Size:120,"Flöde":100}],columns=[{dataField:"TypeID",summarize:!1},{dataField:"Size",summarize:!1},{dataField:"Flöde",summarize:!0}];


const [ sumCols, groupCols ] = R.partition(
  R.prop("summarize"), 
  columns
);

const groupKeys = R.pluck("dataField", groupCols);
const sumKeys = R.pluck("dataField", sumCols);

const grouper = R.reduce(
  (f, g) => x => `${f(x)}___${g(x)}`,
  R.always("GROUPKEY"),
  R.map(R.prop, groupKeys)
);

const reducer = R.reduce(
  (f, k) => (a, b) => R.mergeRight(
    f(a, b),
    { [k]: (a[k] || 0) + b[k] }
  ),
  R.mergeRight,
  sumKeys
);

const allowedKeys = new Set(
  [ "nrTotal" ].concat(sumKeys).concat(groupKeys)
);

const cleanKeys = R.pipe(
  R.toPairs,
  R.filter(([k, v]) => allowedKeys.has(k)),
  R.fromPairs
);

const items = R.flatten(
  R.values(
    R.map(
      xs => cleanKeys(
        R.reduce(
          reducer,
          { nrTotal: xs.length },
          xs
        )
      ),
      R.groupBy(grouper, arr)
    )
  )
);

console.log(items);

<script src="https://cdnjs.cloudflare.com/ajax/libs/ramda/0.26.1/ramda.min.js"></script>

Answer 2

这是我最初的方法。除summarize之外的所有内容都是辅助函数，我想如果您确实需要，可以内联。通过这种分离，我发现它更干净。

const getKeys = (val) => pipe (
  filter (propEq ('summarize', val) ),
  pluck ('dataField')
) 

const keyMaker = (columns, keys = getKeys (false) (columns)) => pipe (
  pick (keys),
  JSON .stringify
)

const makeReducer = (
  columns,
  toSum = getKeys (true) (columns),
  toInclude = getKeys (false) (columns),
) => (a, b) => ({
  ...mergeAll (map (k => ({ [k]: b[k] }), toInclude ) ),
  ...mergeAll (map (k => ({ [k]: (a[k] || 0) + b[k] }), toSum ) ),
  nrOfItems: (a .nrOfItems || 0) + 1
})

const summarize = (columns) => pipe (
  groupBy (keyMaker (columns) ),
  values,
  map (reduce (makeReducer (columns), {} ))
)

const arr = [{"Flöde": 55, "Global Id": "1231", "Size": 160, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "5433", "Size": 160, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "50433", "Size": 120, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "452", "Size": 120, "TypeID": "FD2"}]
const columns = [{"dataField": "TypeID", "summarize": false}, {"dataField": "Size", "summarize": false}, {"dataField": "Flöde", "summarize": true}]

console .log (
  summarize (columns) (arr)
)

<script src="https://bundle.run/ramda@0.26.1"></script><script>
const {pipe, filter, propEq, pluck, pick, mergeAll, map, groupBy, values, reduce} = ramda</script>

Joe的解决方案有很多重叠之处，但也有一些实际差异。当我看到这个问题时，他已经被发布了，但是我希望自己的方法不受影响，所以直到我写完上面的内容，我才看。注意我们的哈希函数的区别。当Joe's创建JSON.stringify时，Mine对{TypeID: "FD1", Size: 160}之类的值执行"GROUPKEY___FD1___160"。我想我更喜欢我的简单性。另一方面，在处理nrOfItems方面，Joe的解决方案肯定比我的解决方案好。我在每次reduce迭代中都进行了更新，必须使用|| 0来处理初始情况。 Joe只是以已知的值开始弃牌。但总体而言，解决方案非常相似。

您提到要减少通过数据的次数。我编写Ramda代码的方式往往对此无济于事。此代码迭代整个列表，将其分组为相似的项，然后遍历每个组以折叠为单个值。（在values中可能还会有一个较小的迭代。）当然可以更改它们以合并这两个迭代。它甚至可能使代码更短。但是在我看来，这将变得更加难以理解。

更新

我对单遍方法感到好奇，发现我可以使用为多遍方法构建的所有基础结构，仅重写主要功能：

const summarize2 = (columns) => (
  arr,
  makeKey = keyMaker (columns),
  reducer = makeReducer (columns)
) => values (reduce (
  (a, item, key = makeKey (item) ) => assoc (key, reducer (key in a ? a[key]: {}, item), a),
  {},
  arr
))

console .log (
  summarize2 (columns) (arr)
)

除非测试表明此代码是我的应用程序的瓶颈，否则我不会选择原始代码。但这并没有我想的那么复杂，它一次迭代即可完成所有操作（好，values除外。）有趣的是，它使我对{ {1}}。我的助手代码仅在此版本中可用，而我不必知道组的总大小。如果我使用乔的方法，那不会发生。

Ramdajs，带参数的组数组

2 个答案:

更新