我有这个包含3列的数据框 - > userId,date,generation
#include <iostream>
using namespace std;
int main() {
int n,num,flag;
do{
cout<<"Enter number";
cin>>n;
num = n;
flag = 0;
while(num>0)
{
if(num%10 == 2 || num%10 ==3)
{
flag = 1;
break;
}
num = num/10;
}
}while(flag==1);
return 0;
}
我想根据 userId 和 日期 对这些值进行分组
但问题是第3列包含maptype的值,并且要求是将所有maptype值组合在一列中,最终输出应该如下所示 - &gt;
+-------+--------+----------------------------------------------------------------------------+
|userId | date |generation |
+-------+--------+----------------------------------------------------------------------------+
|1 |20160926|Map("screen_WiFi" -> 15.127, "upload_WiFi" -> 0.603, "total_WiFi" -> 19.551)|
|1 |20160926|Map("screen_2g" -> 0.573, "upload_2g" -> 0.466, "total_2g" -> 1.419) |
|1 |20160926|Map("screen_3g" -> 10.084, "upload_3g" -> 80.515, "total_3g" -> 175.435) |
+-------+--------+----------------------------------------------------------------------------+
有没有办法解决这个问题,或任何可能的解决方法?
答案 0 :(得分:2)
您可以创建一个组合地图的天真用户定义聚合函数(UDAF),然后将其用作聚合函数。由于您没有定义如何在地图中为两个相同的键组合两个值,我将假设键是唯一的,即对于每个userId
和date
,两个不同的记录中不会出现任何密钥:
/***
* UDAF combining maps, overriding any duplicate key with "latest" value
* @param keyType DataType of Map key
* @param valueType DataType of Value key
* @tparam K key type
* @tparam V value type
*/
class CombineMaps[K, V](keyType: DataType, valueType: DataType) extends UserDefinedAggregateFunction {
override def inputSchema: StructType = new StructType().add("map", dataType)
override def bufferSchema: StructType = inputSchema
override def dataType: DataType = MapType(keyType, valueType)
override def deterministic: Boolean = true
override def initialize(buffer: MutableAggregationBuffer): Unit = buffer.update(0 , Map[K, V]())
// naive implementation - assuming keys won't repeat, otherwise later value for key overrides earlier one
override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
val before = buffer.getAs[Map[K, V]](0)
val toAdd = input.getAs[Map[K, V]](0)
val result = before ++ toAdd
buffer.update(0, result)
}
override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = update(buffer1, buffer2)
override def evaluate(buffer: Row): Any = buffer.getAs[Map[String, Int]](0)
}
// instantiate a CombineMaps with the relevant types:
val combineMaps = new CombineMaps[String, Double](StringType, DoubleType)
// groupBy and aggregate
val result = input.groupBy("userId", "date").agg(combineMaps(col("generation")))