我正在学习Scala / Spark。 Scala中很少有groupby操作导致rdd下面。现在我正在尝试将下面的内容写入sql数据帧并将其保存在hadoop中。然而,当写入sql数据帧时,它转换为
示例RDD格式:
02-25 03:45:34.851 5845-5845/? E/memtrack: Couldn't load memtrack module (No such file or directory)
02-25 03:45:34.851 5845-5845/? E/android.os.Debug: failed to load memtrack module: -2
02-25 03:45:40.011 5857-5857/? E/memtrack: Couldn't load memtrack module (No such file or directory)
02-25 03:45:40.011 5857-5857/? E/android.os.Debug: failed to load memtrack module: -2
02-25 03:45:40.741 5870-5870/? E/memtrack: Couldn't load memtrack module (No such file or directory)
02-25 03:45:40.741 5870-5870/? E/android.os.Debug: failed to load memtrack module: -2
02-25 03:45:40.931 5881-5881/? E/AndroidRuntime: in writeCrashedAppName, pkgName :maruf.infinity.story.englishshortstories
02-25 03:45:40.931 5881-5881/? E/AndroidRuntime: FATAL EXCEPTION: main
Process: maruf.infinity.story.englishshortstories, PID: 5881
java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{maruf.infinity.story.englishshortstories/maruf.infinity.story.englishshortstories.MainActivity}: java.lang.NullPointerException
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2121)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
at android.app.ActivityThread.access$800(ActivityThread.java:135)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5021)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.lang.NullPointerException
at android.content.ContextWrapper.getResources(ContextWrapper.java:89)
at android.view.ContextThemeWrapper.getResources(ContextThemeWrapper.java:78)
at android.support.v7.app.AppCompatActivity.getResources(AppCompatActivity.java:542)
at maruf.infinity.story.englishshortstories.MainActivity.<init>(MainActivity.java:35)
at java.lang.Class.newInstanceImpl(Native Method)
at java.lang.Class.newInstance(Class.java:1208)
at android.app.Instrumentation.newActivity(Instrumentation.java:1064)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2112)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
at android.app.ActivityThread.access$800(ActivityThread.java:135)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5021)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
at dalvik.system.NativeStart.main(Native Method)
02-25 03:45:41.311 5898-5898/? E/AndroidRuntime: in writeCrashedAppName, pkgName :maruf.infinity.story.englishshortstories
02-25 03:45:41.311 5898-5898/? E/AndroidRuntime: FATAL EXCEPTION: main
Process: maruf.infinity.story.englishshortstories, PID: 5898
java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{maruf.infinity.story.englishshortstories/maruf.infinity.story.englishshortstories.MainActivity}: java.lang.NullPointerException
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2121)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
at android.app.ActivityThread.access$800(ActivityThread.java:135)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5021)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.lang.NullPointerException
at android.content.ContextWrapper.getResources(ContextWrapper.java:89)
at android.view.ContextThemeWrapper.getResources(ContextThemeWrapper.java:78)
at android.support.v7.app.AppCompatActivity.getResources(AppCompatActivity.java:542)
at maruf.infinity.story.englishshortstories.MainActivity.<init>(MainActivity.java:35)
at java.lang.Class.newInstanceImpl(Native Method)
at java.lang.Class.newInstance(Class.java:1208)
at android.app.Instrumentation.newActivity(Instrumentation.java:1064)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2112)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
at android.app.ActivityThread.access$800(ActivityThread.java:135)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5021)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
at dalvik.system.NativeStart.main(Native Method)
02-25 03:45:41.451 5910-5910/? E/AndroidRuntime: in writeCrashedAppName, pkgName :maruf.infinity.story.englishshortstories
02-25 03:45:41.451 5910-5910/? E/AndroidRuntime: FATAL EXCEPTION: main
Process: maruf.infinity.story.englishshortstories, PID: 5910
java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{maruf.infinity.story.englishshortstories/maruf.infinity.story.englishshortstories.MainActivity}: java.lang.NullPointerException
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2121)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
at android.app.ActivityThread.access$800(ActivityThread.java:135)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5021)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.lang.NullPointerException
at android.content.ContextWrapper.getResources(ContextWrapper.java:89)
at android.view.ContextThemeWrapper.getResources(ContextThemeWrapper.java:78)
at android.support.v7.app.AppCompatActivity.getResources(AppCompatActivity.java:542)
at maruf.infinity.story.englishshortstories.MainActivity.<init>(MainActivity.java:35)
at java.lang.Class.newInstanceImpl(Native Method)
at java.lang.Class.newInstance(Class.java:1208)
at android.app.Instrumentation.newActivity(Instrumentation.java:1064)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2112)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2245)
at android.app.ActivityThread.access$800(ActivityThread.java:135)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1196)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5021)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:827)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:643)
at dalvik.system.NativeStart.main(Native Method)
直接使用.toDF给出
Array[((String, Double), (String, Double))] = Array(((Veterans Affairs Dept of,11669.0),(Veterans Affairs Dept of,101124.0)), ((Office Wisc Public Defender,40728.0),(Office Wisc Public Defender,40728.0)))
如何才能获得以下格式的上述内容:
| _1| _2|
+--------------------+--------------------+
|[Veterans Affairs...|[Veterans Affairs...|
|[Office Wisc Publ...|[Office Wisc Publ...|
|[Health Services,...|[Health Services,...|
答案 0 :(得分:1)
由于您已经使用了groupBy操作,因此我将假设Array[((String,Double),(String,Double))]
中的两个字符串都相同。如果是这样,那么您可以尝试以下方法:
val myRDD=Array[((String,Double),(String,Double))]
val strings = myRDD.map(a=>a._1._1)
val values = myRDD.map(a=>(a._1._2,a._2._2))
val rows = strings.zip(values)
val rowsDF=rows.map{case (a,b)=>(a,b._1,b._2)}.toDF
例如,请考虑以下虚拟数据
val myRDD=sc.parallelize(Array((("string1",1.0),("string1",2.0)),(("string2",3.0),("string2",4.0))))
myRDD: org.apache.spark.rdd.RDD[((String, Double), (String, Double))] = ParallelCollectionRDD[33] at parallelize at <console>:27
输出
scala> rowsDF: org.apache.spark.sql.DataFrame = [_1: string, _2: double, _3: double]
scala> rowsDF.collect()
res49: Array[org.apache.spark.sql.Row] = Array([string1,1.0,2.0], [string2,3.0,4.0])
答案 1 :(得分:1)
如果RDD
字符串等于((String, Double), (String, Double))
字符串,那么您RDD
(String, Double, Double)
并且希望转换为row._1._1
row._2._1
。< / p>
val input: Array[((String, Double), (String, Double))] =
Array((("Veterans Affairs Dept of", 11669.0), ("Veterans Affairs Dept of", 101124.0)),
(("Office Wisc Public Defender", 40728.0), ("Office Wisc Public Defender", 40728.0)))
输入RDD[((String, Double), (String, Double))]
val myRDD: RDD[((String, Double), (String, Double))] = sc.parallelize(input)
使用RDD[(String, Double, Double)]
转换为flatMap
。
val resultRDD: RDD[(String, Double, Double)] =
myRDD.flatMap(row => row._1._1 match {
case firstString if firstString == row._2._1 =>
Some((firstString, row._1._2, row._2._2))
case _ => None
})
将RDD隐藏到数据框中。
resultRDD.toDF().show()
结果:
+--------------------+-------+--------+
| _1| _2| _3|
+--------------------+-------+--------+
|Veterans Affairs ...|11669.0|101124.0|
|Office Wisc Publi...|40728.0| 40728.0|
+--------------------+-------+--------+