Question

我有一个使用此架构的DataFrame：

id      user        keywords
1       u1, u2      key1, key2  
1       u3, u4      key3, key4
1       u5, u6      key5, key6
2       u7, u8      key7, key8
2       u9, u10     key9, key10
3       u11, u12    key11, key12
3       u13, u14    key13, key14

我需要一种方法来按行对Rows进行分组，并在用户和关键字列中连接字符串，使它看起来像这样：

id      user                            keywords
1       u1, u2, u3, u4, u5, u6          key1, key2, key3, key4, key5, key6
2       u7, u8, u9, u10                 key7, key8, key9, key10
3       u11, u12, u13, u14              key11, key12, key13, key14

我如何用Java做到这一点？

Answer 1

做类似的事情：

使用（用户，（作者，关键字）
此RDD上的groupByKey
到作者和关键词的某些flatMap

用于Java中的Apache Spark的GroupBy和Concatenate DataFrame行

1 个答案: