如何在Spark中创建Generic Tuple数据集

时间:2018-03-09 08:19:43

标签: scala apache-spark tuples spark-dataframe

我在将数据帧转换为Spark(2.2.1)Scala(2.11.8)中的数据集时遇到问题

基本上,我正在尝试将左数据集收集到列表中的聚合。我正在使用案例类和元组在整个地方执行此步骤。我不想一遍又一遍地重写相同的例程,所以我决定重构这个方法的步骤:

 def collectUsingGenerics[L <: Product : Encoder,R <: Product : Encoder](
                              left: Dataset[L],
                              right: Dataset[R],
                              joinCol: Column,
                              groupCol: Column): Dataset[(L,List[R])] = {

import left.sparkSession.implicits._
import org.apache.spark.sql.functions._

val result = left
  .join(right, joinCol)
  .select(
    groupCol.as("groupCol"),
    struct(left("*")).as("_1"),
    struct(right("*")).as("_2"))
  .groupBy($"groupCol")
  .agg(
    first($"_1").as("_1"),
    collect_list($"_2").as("_2")
  )
  .drop($"groupCol")

//This does not Work!!!
  result.as[(L,List[R])]
}

单元测试:

"collectUsingGenerics" should "collect the right-side Dataset" in {
   val left = spark.createDataset(Seq(
     (1, "Left 1"),
     (2, "Left 2")
   ))

   val right = spark.createDataset(Seq(
     (101, 1, "Right 1"),
     (102, 1, "Right 2"),
     (103, 2, "Right 3")
   ))

  val collectedDataset = Transformations.collectUsingGenerics[(Int, String), (Int, Int, String)](left, right, left("_1") === right("_2"), left("_1"))
      .collect()
      .sortBy(_._1._1)

  val data1 = collectedDataset(0)
  data1._1 should be (1, "Left 1")
  data1._2 should contain only((101, 1, "Right 1"), (102, 1, "Right 2"))
}

问题是,由于缺少编码器,我无法编译:

Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
[error]     result.as[(L,List[R])]
[error]              ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed

我的印象是导入spark.implicits._足以生成元组和case类的编码器,以及原始类型。我错过了什么吗?

1 个答案:

答案 0 :(得分:2)

您还需要为这些类型隐式TypeTag。请在此处查看原始问题:scala generic encoder for spark case class

mDragHelper = ViewDragHelper.create(this, 1.0f, new ViewDragHelper.Callback() {
        float startX, startY,startTX,startTY;

        @Override
        public boolean tryCaptureView(View child, int pointerId) {
            boolean res = mDragViews.contains(child);
            Log.d(TAG, "tryCaptureView() called with:res=" + res + " child = [" + child + "], pointerId = [" + pointerId + "]");
            return res;
        }

        @Override
        public void onViewPositionChanged(View changedView, int left, int top, int dx, int dy) {
            super.onViewPositionChanged(changedView, left, top, dx, dy);
        }

        @Override
        public int clampViewPositionHorizontal(View child, int left, int dx) {
            return left;
        }

        @Override
        public int clampViewPositionVertical(View child, int top, int dy) {
            return top;
        }

        @Override
        public void onViewCaptured(View capturedChild, int activePointerId) {
            super.onViewCaptured(capturedChild, activePointerId);
            if (startX == 0) {
                startX = capturedChild.getX();
                startY = capturedChild.getY();
                startTX=capturedChild.getTranslationX();
                startTY=capturedChild.getTranslationY();
            }
            Log.d(TAG, "onViewCaptured: startX=" + startX + " startY=" + startY);
        }

        @Override
        public void onViewReleased(View releasedChild, float xvel, float yvel) {
            super.onViewReleased(releasedChild, xvel, yvel);
            releaseAnim(releasedChild, startX, startY,startTX,startTY);
        }
    });       

    private void releaseAnim(final View releasedChild, final float startX, final float startY, final float startTX, final float startTY) {
    Log.d(TAG, "releaseAnim() called with: releasedChild = [" + releasedChild + "], startX = [" + startTX + "], startY = [" + startTY + "]");
    PropertyValuesHolder pvhX = PropertyValuesHolder.ofFloat(View.X, releasedChild.getX(),startX);
    PropertyValuesHolder pvhY = PropertyValuesHolder.ofFloat(View.Y, releasedChild.getY(),startY);
    PropertyValuesHolder pvhTX = PropertyValuesHolder.ofFloat(View.TRANSLATION_X, releasedChild.getTranslationX(),startTX);
    PropertyValuesHolder pvhTY = PropertyValuesHolder.ofFloat(View.TRANSLATION_Y, releasedChild.getTranslationY(),startTY);
    ObjectAnimator downAnim = ObjectAnimator.ofPropertyValuesHolder(
            releasedChild
            , pvhTX, pvhTY
            , pvhX, pvhY
    );

    downAnim.setInterpolator(sAccelerator);
    downAnim.setDuration(200);
    downAnim.start();


}