org.apache.spark.rdd.RDD [((String,Double),(String,Double))]到Scala中的Dataframe

时间:2018-02-24 21:23:40

标签: scala

我正在学习Scala / Spark。 Scala中很少有groupby操作导致rdd下面。现在我正在尝试将下面的内容写入sql数据帧并将其保存在hadoop中。然而,当写入sql数据帧时,它转换为

示例RDD格式:

    02-25 03:45:34.851 5845-5845/? E/memtrack: Couldn't load memtrack module (No such file or directory)
02-25 03:45:34.851 5845-5845/? E/android.os.Debug: failed to load memtrack module: -2
02-25 03:45:40.011 5857-5857/? E/memtrack: Couldn't load memtrack module (No such file or directory)
02-25 03:45:40.011 5857-5857/? E/android.os.Debug: failed to load memtrack module: -2
02-25 03:45:40.741 5870-5870/? E/memtrack: Couldn't load memtrack module (No such file or directory)
02-25 03:45:40.741 5870-5870/? E/android.os.Debug: failed to load memtrack module: -2
02-25 03:45:40.931 5881-5881/? E/AndroidRuntime: in writeCrashedAppName, pkgName :maruf.infinity.story.englishshortstories
02-25 03:45:40.931 5881-5881/? E/AndroidRuntime: FATAL EXCEPTION: main
                                                 Process: maruf.infinity.story.englishshortstories, PID: 5881
                                                 java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{maruf.infinity.story.englishshortstories/maruf.infinity.story.englishshortstories.MainActivity}: java.lang.NullPointerException
                                                     at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2121)
                                                     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
                                                     at android.app.ActivityThread.access$800(ActivityThread.java:135)
                                                     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
                                                     at android.os.Handler.dispatchMessage(Handler.java:102)
                                                     at android.os.Looper.loop(Looper.java:136)
                                                     at android.app.ActivityThread.main(ActivityThread.java:5021)
                                                     at java.lang.reflect.Method.invokeNative(Native Method)
                                                     at java.lang.reflect.Method.invoke(Method.java:515)
                                                     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
                                                     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
                                                     at dalvik.system.NativeStart.main(Native Method)
                                                  Caused by: java.lang.NullPointerException
                                                     at android.content.ContextWrapper.getResources(ContextWrapper.java:89)
                                                     at android.view.ContextThemeWrapper.getResources(ContextThemeWrapper.java:78)
                                                     at android.support.v7.app.AppCompatActivity.getResources(AppCompatActivity.java:542)
                                                     at maruf.infinity.story.englishshortstories.MainActivity.<init>(MainActivity.java:35)
                                                     at java.lang.Class.newInstanceImpl(Native Method)
                                                     at java.lang.Class.newInstance(Class.java:1208)
                                                     at android.app.Instrumentation.newActivity(Instrumentation.java:1064)
                                                     at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2112)
                                                     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245) 
                                                     at android.app.ActivityThread.access$800(ActivityThread.java:135) 
                                                     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196) 
                                                     at android.os.Handler.dispatchMessage(Handler.java:102) 
                                                     at android.os.Looper.loop(Looper.java:136) 
                                                     at android.app.ActivityThread.main(ActivityThread.java:5021) 
                                                     at java.lang.reflect.Method.invokeNative(Native Method) 
                                                     at java.lang.reflect.Method.invoke(Method.java:515) 
                                                     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827) 
                                                     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643) 
                                                     at dalvik.system.NativeStart.main(Native Method) 
02-25 03:45:41.311 5898-5898/? E/AndroidRuntime: in writeCrashedAppName, pkgName :maruf.infinity.story.englishshortstories
02-25 03:45:41.311 5898-5898/? E/AndroidRuntime: FATAL EXCEPTION: main
                                                 Process: maruf.infinity.story.englishshortstories, PID: 5898
                                                 java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{maruf.infinity.story.englishshortstories/maruf.infinity.story.englishshortstories.MainActivity}: java.lang.NullPointerException
                                                     at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2121)
                                                     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
                                                     at android.app.ActivityThread.access$800(ActivityThread.java:135)
                                                     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
                                                     at android.os.Handler.dispatchMessage(Handler.java:102)
                                                     at android.os.Looper.loop(Looper.java:136)
                                                     at android.app.ActivityThread.main(ActivityThread.java:5021)
                                                     at java.lang.reflect.Method.invokeNative(Native Method)
                                                     at java.lang.reflect.Method.invoke(Method.java:515)
                                                     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
                                                     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
                                                     at dalvik.system.NativeStart.main(Native Method)
                                                  Caused by: java.lang.NullPointerException
                                                     at android.content.ContextWrapper.getResources(ContextWrapper.java:89)
                                                     at android.view.ContextThemeWrapper.getResources(ContextThemeWrapper.java:78)
                                                     at android.support.v7.app.AppCompatActivity.getResources(AppCompatActivity.java:542)
                                                     at maruf.infinity.story.englishshortstories.MainActivity.<init>(MainActivity.java:35)
                                                     at java.lang.Class.newInstanceImpl(Native Method)
                                                     at java.lang.Class.newInstance(Class.java:1208)
                                                     at android.app.Instrumentation.newActivity(Instrumentation.java:1064)
                                                     at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2112)
                                                     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245) 
                                                     at android.app.ActivityThread.access$800(ActivityThread.java:135) 
                                                     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196) 
                                                     at android.os.Handler.dispatchMessage(Handler.java:102) 
                                                     at android.os.Looper.loop(Looper.java:136) 
                                                     at android.app.ActivityThread.main(ActivityThread.java:5021) 
                                                     at java.lang.reflect.Method.invokeNative(Native Method) 
                                                     at java.lang.reflect.Method.invoke(Method.java:515) 
                                                     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827) 
                                                     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643) 
                                                     at dalvik.system.NativeStart.main(Native Method) 
02-25 03:45:41.451 5910-5910/? E/AndroidRuntime: in writeCrashedAppName, pkgName :maruf.infinity.story.englishshortstories
02-25 03:45:41.451 5910-5910/? E/AndroidRuntime: FATAL EXCEPTION: main
                                                 Process: maruf.infinity.story.englishshortstories, PID: 5910
                                                 java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{maruf.infinity.story.englishshortstories/maruf.infinity.story.englishshortstories.MainActivity}: java.lang.NullPointerException
                                                     at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2121)
                                                     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
                                                     at android.app.ActivityThread.access$800(ActivityThread.java:135)
                                                     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
                                                     at android.os.Handler.dispatchMessage(Handler.java:102)
                                                     at android.os.Looper.loop(Looper.java:136)
                                                     at android.app.ActivityThread.main(ActivityThread.java:5021)
                                                     at java.lang.reflect.Method.invokeNative(Native Method)
                                                     at java.lang.reflect.Method.invoke(Method.java:515)
                                                     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
                                                     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
                                                     at dalvik.system.NativeStart.main(Native Method)
                                                  Caused by: java.lang.NullPointerException
                                                     at android.content.ContextWrapper.getResources(ContextWrapper.java:89)
                                                     at android.view.ContextThemeWrapper.getResources(ContextThemeWrapper.java:78)
                                                     at android.support.v7.app.AppCompatActivity.getResources(AppCompatActivity.java:542)
                                                     at maruf.infinity.story.englishshortstories.MainActivity.<init>(MainActivity.java:35)
                                                     at java.lang.Class.newInstanceImpl(Native Method)
                                                     at java.lang.Class.newInstance(Class.java:1208)
                                                     at android.app.Instrumentation.newActivity(Instrumentation.java:1064)
                                                     at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2112)
                                                     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245) 
                                                     at android.app.ActivityThread.access$800(ActivityThread.java:135) 
                                                     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196) 
                                                     at android.os.Handler.dispatchMessage(Handler.java:102) 
                                                     at android.os.Looper.loop(Looper.java:136) 
                                                     at android.app.ActivityThread.main(ActivityThread.java:5021) 
                                                     at java.lang.reflect.Method.invokeNative(Native Method) 
                                                     at java.lang.reflect.Method.invoke(Method.java:515) 
                                                     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827) 
                                                     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643) 
                                                     at dalvik.system.NativeStart.main(Native Method) 

直接使用.toDF给出

Array[((String, Double), (String, Double))] = Array(((Veterans Affairs Dept of,11669.0),(Veterans Affairs Dept of,101124.0)), ((Office Wisc Public Defender,40728.0),(Office Wisc Public Defender,40728.0)))

如何才能获得以下格式的上述内容:

 |                  _1|                  _2|
 +--------------------+--------------------+
 |[Veterans Affairs...|[Veterans Affairs...|
 |[Office Wisc Publ...|[Office Wisc Publ...| 
 |[Health Services,...|[Health Services,...|

2 个答案:

答案 0 :(得分:1)

由于您已经使用了groupBy操作,因此我将假设Array[((String,Double),(String,Double))]中的两个字符串都相同。如果是这样,那么您可以尝试以下方法:

val myRDD=Array[((String,Double),(String,Double))]

val strings = myRDD.map(a=>a._1._1)

val values = myRDD.map(a=>(a._1._2,a._2._2))

val rows = strings.zip(values)

val rowsDF=rows.map{case (a,b)=>(a,b._1,b._2)}.toDF

例如,请考虑以下虚拟数据

val myRDD=sc.parallelize(Array((("string1",1.0),("string1",2.0)),(("string2",3.0),("string2",4.0))))

myRDD: org.apache.spark.rdd.RDD[((String, Double), (String, Double))] = ParallelCollectionRDD[33] at parallelize at <console>:27

输出

scala> rowsDF: org.apache.spark.sql.DataFrame = [_1: string, _2: double, _3: double]
scala> rowsDF.collect()
res49: Array[org.apache.spark.sql.Row] = Array([string1,1.0,2.0], [string2,3.0,4.0])

答案 1 :(得分:1)

如果RDD字符串等于((String, Double), (String, Double))字符串,那么您RDD (String, Double, Double)并且希望转换为row._1._1 row._2._1。< / p>

val input: Array[((String, Double), (String, Double))] =
    Array((("Veterans Affairs Dept of", 11669.0), ("Veterans Affairs Dept of", 101124.0)),
      (("Office Wisc Public Defender", 40728.0), ("Office Wisc Public Defender", 40728.0)))

输入RDD[((String, Double), (String, Double))]

val myRDD: RDD[((String, Double), (String, Double))] = sc.parallelize(input)

使用RDD[(String, Double, Double)]转换为flatMap

val resultRDD: RDD[(String, Double, Double)] =
    myRDD.flatMap(row => row._1._1 match {
      case firstString if firstString == row._2._1 =>
        Some((firstString, row._1._2, row._2._2))
      case _ => None
    })

将RDD隐藏到数据框中。

resultRDD.toDF().show()

结果:

+--------------------+-------+--------+
|                  _1|     _2|      _3|
+--------------------+-------+--------+
|Veterans Affairs ...|11669.0|101124.0|
|Office Wisc Publi...|40728.0| 40728.0|
+--------------------+-------+--------+