Groovy - 从复杂嵌套映射中聚合和建模数据

时间:2015-03-04 12:08:02

标签: list dictionary groovy

我在下面的代码段中提供了groovy中的数据:

def productAvailability = [
  [id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
  [id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
  [id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
  [id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
  [id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];

目标是获得如下结果:

产品统计

Product Id: 1 | Availability Index: 15 + 11 + 11 = 37.
   Longest Available Products (Sort By Past Start Date *first* then, Store Id): 
       1. "2014-12-24" to "2015-01-08" in store id 2. (15 days)
       2. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
       3. "2015-01-10" to "2015-01-21" in store id 1. (11 days)
Product Id: 3 | Availability Index: 7 + 11 = 18.
   Longest Available Products (Sort By Past Start Date *first* then, Store Id): 
       1. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
       2. "2014-12-25" to "2015-01-01" in store id 2. (7 days)

商店统计

Store Id: 1 | Availability Index: 11 + 11 + 11 = 33.
   Most Available Product (sort by most available product, then sort by product id):
       1. Product Id: 3 on ["2014-12-22" to "2015-01-02"] (11 days)
       2. Product Id: 1 on ["2014-12-22" to "2015-01-02", "2015-01-10" to "2015-01-21"] (11 days)
Store Id: 2 | Availability Index: 15 + 7 = 22.
   Most Available Product (sort by most available product, then sort by product id):
       1. Product Id: 1 on ["2014-12-24" to "2015-01-08"] (15 days)
       2. Product Id: 3 on ["2014-12-25" to "2015-01-01"] (7 days)

Total Availability Index: 37 + 18 or 33 + 22 = 55.

以上打印结果是产品统计和商店统计。 我会寻求优化,高效且易于理解的解决方案来打印上面的结果。

我尝试从上述数据中获得结果:

// productAvailability => see the declaration variable above in the beginning of question!
List aggregateDates = productAvailability.collect({[
    storeId: it.storeId,
    productId: it.productId,
    availabilityIndex: Date.parse("YYYY-MM-dd", it.endDate) - Date.parse("YYYY-MM-dd", it.startDate) 
]});
println "Total Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex });
println "Total Products: " +  aggregateDates.clone().unique({ it.productId }).count({ it.productId });
println "Total Stores: " + aggregateDates.clone().unique({ it.storeId }).count({ it.storeId });
println "Average Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex }) / aggregateDates.size();

正如您在上面的代码段中所看到的,我可以非常轻松地获得总计SUM,AVG和COUNT个 productAvailability 数据中的PRODUCT和STORE数量。但是,我很难根据PRODUCT和STORE使用日期范围来获得上述目标。

使用日期范围查看下面的代码。

def dailyDatesAvailability = [:] as Map<Date, Integer>;
def dailyStoresAvailability = [:].withDefault {0} as Map<Integer, Integer>;
def dailyProductsAvailability  = [:].withDefault {0} as Map<Integer, Integer>;
(Date.parse("YYYY-MM-dd", "2014-12-01")).upto((Date.parse("YYYY-MM-dd", "2015-01-30"))) { Date runningDate ->
        dailyDatesAvailability[runningDate] = 0;
        productAvailability.each({ _availability ->
            def _startDate = Date.parse("YYYY-MM-dd", _availability.startDate);
            def _endDate = Date.parse("YYYY-MM-dd", _availability.endDate);
            if (_startDate <= runningDate && _endDate >= runningDate) {
                dailyDatesAvailability[runningDate]++;
                dailyProductsAvailability[_availability.productId]++;
                dailyStoresAvailability[_availability.storeId]++;
            }

         // Do something here to get the MOST available PRODUCT in a STORE with date ranges
        });
       /// or do something here....?
    }

使用Groovy打印目标的最佳方法是什么?请分享代码段以便进行测试。

1 个答案:

答案 0 :(得分:2)

对此感兴趣,并提出:

List<Range> simplify( List<Range> ranges ) {
  ranges.drop( 1 ).inject( ranges.take( 1 ) ) { r, curr ->
    // Find an overlapping range
    def ov = r.find { curr.from <= it.to && curr.to >= it.from }
    if( ov ) {
      ov.from = [ curr.from, ov.from ].min()
      ov.to   = [ curr.to, ov.to ].max()
      simplify( r )
    }
    else {
      r << curr
    }
  }
}

def manipulate(data, primary, secondary) {
    data.groupBy { it."$primary" }
        .collect { id, vals ->
            def joined = vals.collect { it ->
                [ id: it.id,
                  range: Date.parse('yyyy-MM-dd', it.startDate)..Date.parse('yyyy-MM-dd', it.endDate),
                  key: secondary,
                  value: it."$secondary" ]
            }.groupBy { it.value }
             .collectMany { sid, ran -> simplify(ran.range).collect { [key: secondary, value: sid, range:it, days:(it.to - it.from)] } }
             .sort { a, b -> b.days <=> a.days ?: a.value - b.value }
            [name:primary, id:id, data:joined]
        }
}

def dump(data) {
    data.collect { a ->
        def sum = a.data.days.sum()
        println "$a.name: $a.id | availability index ${a.data.days.join(' + ')} = ${sum}"
        a.data.eachWithIndex { row, idx ->
            println "    ${idx+1}. ${row.range.from.format('yyyy-MM-dd')} to ${row.range.to.format('yyyy-MM-dd')} in $row.key $row.value ($row.days days)"
        }
        sum
    }
}

def productAvailability = [
  [id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
  [id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
  [id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
  [id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
  [id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];

def p = dump(manipulate(productAvailability, 'productId', 'storeId'))
println ''
def s = dump(manipulate(productAvailability, 'storeId', 'productId'))
println ''
println "Total Availability Index: ${p.join(' + ')} or ${s.join(' + ')} = ${[p.sum(), s.sum()].max()}"

打印出来:

productId: 1 | availability index 15 + 11 + 11 = 37
    1. 2014-12-24 to 2015-01-08 in storeId 2 (15 days)
    2. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
    3. 2015-01-10 to 2015-01-21 in storeId 1 (11 days)
productId: 3 | availability index 11 + 7 = 18
    1. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
    2. 2014-12-25 to 2015-01-01 in storeId 2 (7 days)

storeId: 1 | availability index 11 + 11 + 11 = 33
    1. 2014-12-22 to 2015-01-02 in productId 1 (11 days)
    2. 2015-01-10 to 2015-01-21 in productId 1 (11 days)
    3. 2014-12-22 to 2015-01-02 in productId 3 (11 days)
storeId: 2 | availability index 15 + 7 = 22
    1. 2014-12-24 to 2015-01-08 in productId 1 (15 days)
    2. 2014-12-25 to 2015-01-01 in productId 3 (7 days)

Total Availability Index: 37 + 18 or 33 + 22 = 55