Question

运行时：

rdd.zipWithIndex().map { case ((a,b),c) => (a,b,c)}.collect()

没关系。

如果（a，b）实际上是具有Serializable的产品怎么办？例如。 2,3和4个元素，甚至N个元素？那么如何为任意数量的值编写案例？列表？

Answer 1

使用List而不是RDD，但核心逻辑将保持不变。使用元组的productIterator方法来实现这一目标。

rdd.zipWithIndex.map{ 
  case (tuple, idx) => 
  Row.fromSeq(tuple.productIterator.toSeq :+ idx)
}

我们的想法不是提取元组，而是保留它并使用＆＃34; productIterator＆＃34;并附加它。此方法适用于RDD记录中的任何类型的数据，也将支持复杂的数据结构。

Answer 2

首先，我想说出你能做什么，然后我想说出你为什么不应该这样做。

你可以这样做：

Microsoft.Bot.Connector.Teams

鉴于生成// for (arity <- 2 to 20) { // println( // "case (" + // (0 until arity).map(i => ('a' + i).toChar).mkString("(", ",", ")") + // ",idx) => (" + // (0 until arity).map(i => ('a' + i).toChar).mkString(",") + // ",idx)" // ) // } rdd.zipWithIndex.map{ case ((a,b),idx) => (a,b,idx) case ((a,b,c),idx) => (a,b,c,idx) case ((a,b,c,d),idx) => (a,b,c,d,idx) case ((a,b,c,d,e),idx) => (a,b,c,d,e,idx) case ((a,b,c,d,e,f),idx) => (a,b,c,d,e,f,idx) case ((a,b,c,d,e,f,g),idx) => (a,b,c,d,e,f,g,idx) case ((a,b,c,d,e,f,g,h),idx) => (a,b,c,d,e,f,g,h,idx) case ((a,b,c,d,e,f,g,h,i),idx) => (a,b,c,d,e,f,g,h,i,idx) case ((a,b,c,d,e,f,g,h,i,j),idx) => (a,b,c,d,e,f,g,h,i,j,idx) case ((a,b,c,d,e,f,g,h,i,j,k),idx) => (a,b,c,d,e,f,g,h,i,j,k,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m,n),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,n,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m,n,o),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,idx) case ((a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t),idx) => (a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,idx) case _ => throw new Error("not a tuple") }到Product2的源代码，乍一看它可能看起来像一个丑陋但又可接受的解决方案。

但是，请注意，如果您执行此操作一次，则在下一步中最终会出现完全相同的情况。你再次拥有一个充满不同维度元组的RDD。那你怎么办？同样可怕的比赛案例？如果你继续这样做，它将感染你的整个代码库。

因此，最好立即咬住子弹，并尽快将其转换为某种Product22。您可以使用Seq。考虑使用某种Lists或spark.sql productIterator。也许是这样的：

Row

地图＆amp;具有Serializable的产品的案例

2 个答案:

地图＆amp;具有Seri​​alizable的产品的案例

2 个答案:

地图＆amp;具有Serializable的产品的案例