如何将集合中的结构化数据项合并为组?

时间:2017-05-03 04:51:54

标签: algorithm grouping data-manipulation data-processing

以下是条件:

由于我有一组项目,这些项目是同一个类的实例,其中包含每个属性都有限的可能值的属性。

例如,该类有2个属性(p1,p2),每个属性有2个值(A,B / 1,2),只有4种实例:{p1:" A&# 34;,p2:1},{p1:" A",p2:2},{p1:" B",p2:1},{p1:" B& #34;,p2:2}

要按字符串显示项目,项目{p1:" A",p2:1}将变为" itemA1"和{p1:" B",p2:2}是" itemB2"。

因此,具有唯一项目的集合有16种可能性,从空集([])到通用集([{p1:" A",p2:1},{p1:" A",p2:2},{p1:" B",p2:1},{p1:" B",p2:2}]。

当涉及显示集合时,如果集合包含一个属性中具有所有可能值的项目,而其他属性中包含相同值,则该属性将被隐藏。

例如,设置[{p1:" A",p2:1},{p1:" A",p2:2}]包含p2中所有可能的值p1中的值。它将显示为" itemA",p2被隐藏。

反之亦然,[{p1:" A",p2:2},{p1:" B",p2:2}]显示" item2"

通用集[{p1:" A",p2:1},{p1:" A",p2:2},{p1:" B&# 34;,p2:1},{p1:" B",p2:2}]仅为" item"。

使用不可分组的项目,设置[{p1:" A",p2:1},{p1:" A",p2:2},{p1:&#34 ; B",p2:2}]显示" itemA& B2"或" item2& A1",可配置。

问题出现了:

使用更多属性和值来实现这些规则的好算法是什么?{p1:1 | 2,p2:1 | 2 | 3,p3:1 | 2 | 3,p4:1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9}?

目前我只考虑将每个值与其他固定的属性循环,制作子集并显示每个子集。但它似乎循环了很多次,并且会有这么多的子集。不过,我必须先做一个优先顺序来决定使用哪些可能的子集。

这对我来说是一个复杂的问题,虽然它似乎是一个基本问题。

如果有人对这个问题感兴趣,那就很好。谢谢你的阅读。

3 个答案:

答案 0 :(得分:0)

如何按照以下步骤进行操作?

groups = getAllCombinations()
# getAllCombinations() implementation could recursively generate all combinations. 
# For example, for class has 2 properties(p1,p2), 
# we will have: groups = [ A1, A2, B1, B2 ]

# Loop the below for all class properties [p1, p2]
# Loop through the list of property values[A, B]
for p in properties: 
    combinations = getPropertyCombinations(p) 
    # This should return all combinations by keeping the given property constant.
    # For 'A' it will return 'A1', 'A2'; For 'B' it will return 'B1', 'B2'
    if all combinations in groups:
        #remove all combinations and insert the property
        for item in combinations:
            strItem = str(item)  
            # class item { p1: "A", p2: 1 } will have 'A1' returning str(item)
            groups.remove(strItem)
        groups.add(p)

#Your resulting groups will have resulting group by as per requirement.

希望它有所帮助!

答案 1 :(得分:0)

另一种方法是使用队列数据结构来帮助您减少每个维度的元素。

<强>算法:

1. Loop through all items in the given set
      - generate strings for each item in an array
# After this loop, the set [ { p1: "A", p2: 1 }, { p1: "A", p2: 2 }, { p1: "B", p2: 2 } ]
# will have generate something like: items = ['A1','A2','B2']

2. Sort the array. #this is important to keep the order when reducing dimensions

3. Enqueue(put) all elements in 'Queue'

4. Dequeue(get) set of elements from 'Queue' such that last-1 index of string is same. # effectively we will get all A's then B's

5. If the count of elements in this set matches max elements for this case. We have identified a reduction. So, take 1 element in the set remove last index. Enqueue(put) it in the 'Queue' again

6. Else, add all elements to the 'Group' list.

7. Repeat steps 4-6 till the queue is empty.

8. 'Group' list will have the results as per requirement.

该算法将在O(n * m)中运行。其中n是集合中的元素,而m是no.of属性。

希望它有所帮助!

答案 2 :(得分:0)

感谢Arun Kumar的解决方案建议。

我对2个属性的解决方案,每个属性有2个值:

/* pseudo-javascript-ish code */

// prepare data
properties = [ p1, p2 ]

values = { p1: [A,B], p2: [1,2] }

// groups
property_combinations = [ [], [p2], [p1], [p1,p2] ]

// combinations
value_combinations = [
  [ [A1], [A2], [B1], [B2] ], // []
  [ [A1,B1], [A2,B2] ],       // [p2]
  [ [A1,A2], [B1,B2] ],       // [p1]
  [ [A1,A2,B1,B2] ]           // [p1,p2]
]

// groups and combination could be generated by a function

function getSetItemString ( data ) {

  subgroups = []

  nest:
  for( i = value_combinations.length - 1; i >= 0; i-- ){

    for( j = 0; j < value_combinations[i].length; j++ ){

      // matching items, find if data contains the combination
      if( data.contains( value_combinations[i][j] ) ){

        // remove matched items from data and add it to subgroups
        data.removeItemsIn( value_combinations[i][j] )
        subgroups.push( value_combinations[i][j] )

      }

      if( data.length == 0 ){ break nest }

    }

  }

  // some function make the subgroups into string
  return subgroup.toItemString()

}

getItemString( [ A1, A2, B1 ] )
// returns itemA&B1

// flow:
// subgroup [p1,p2]
// [ A1, A2, B1 ] not match [A1,A2,B1,B2]

// subgroup [p1]
// [ A1, A2, B1 ] matches [A1,A2] => subgorup = [ [A1,A2] ]; set = [ B1 ]
// [ B1 ] not match [B1,B2]

// subgroup [p2]
// [ B1 ] not match [A1,B1]
// [ B1 ] not match [A2,B2]

// subgroup []
// [ B1 ] not match [A1]
// [ B1 ] not match [A2]
// [ B1 ] matches [B1] => subgroup = [ [A1,A2], [B1] ]; set = [ ]
// [ ] breaks the loop

// output
// [ [A1,A2], [B1] ] => itemA&B1

复杂性与组合矩阵的生成和给定集合的匹配项有关。

但是,对于我来说,计算精确的复杂数量有点复杂,它看起来很大。

子组顺序与value_combinations中的顺序相关。

为了生成具有更多属性和值的组合矩阵,我终于想出了这个函数:

function createItemCombinations(structure){ // input example: {p1: ["A","B"], p2: [1,2,3], p3: ["+","-"]}
  let props = Object.keys(structure),
      prop_combs = nm( props.map(p => 1) ).map(ps => ps.reduce((r,v,p) => r.concat(v? props[p] : []), [])),
      value_combs = prop_combs.map(sub => {
        let mutable = Object.entries(structure).reduce((r,[p,vs]) => {
              if(sub.indexOf(p) == -1){
                return Object.assign(r, {[p]: vs})
              }else{
                return r
              }
            }, {}),
            immutable = Object.entries(structure).reduce((r,[p,vs]) => {
              if(sub.indexOf(p) != -1){
                return Object.assign(r, {[p]: vs})
              }else{
                return r
              }
            }, {}),
            mu_keys = Object.keys(mutable),
            im_keys = Object.keys(immutable),
            mu_combs = nm( mu_keys.map(k => mutable[k].length - 1) ).map(m => m.reduce((r,v,i) => Object.assign(r, {[mu_keys[i]]: mutable[mu_keys[i]][v]}), {})),
            im_combs = nm( im_keys.map(k => immutable[k].length - 1) ).map(m => m.reduce((r,v,i) => Object.assign(r, {[im_keys[i]]: immutable[im_keys[i]][v]}), {}))
        return im_combs.map(im => {
          return { items: mu_combs.map(mu => Object.assign({}, mu, im)), subgroup: Object.assign({}, im) }
        })
      })
  return value_combs
}
// nm is a function which receives an array like [1,1,1] which mean 3 properties with 2 values each, and returns an array with all combinations in indexes like [ [0,0,0], [0,0,1], [0,1,0], ... [1,1,1] ].