Spark将Array [Array [Any]]的结果写入文件

时间:2017-08-14 05:44:45

标签: scala apache-spark hadoop2

我输入如下样本

3070811,1963,1096,,"US","CA",,1,
3022811,1963,1096,,"US","CA",,1,56
3033811,1963,1096,,"US","CA",,1,23

在用0替换空字符后,我试图将结果写入textFile并且我正在

scala> result.saveAsTextFile("data/result")
<console>:34: error: value saveAsTextFile is not a member of Array[Array[Any]]
              result.saveAxtFile("data/result")

这是解决方案

scala> val file2 = sc.textFile("data/file.txt")
scala> val mapper = file2.map(x => x.split(",",-1))
scala> val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x)).collect()
result: Array[Array[Any]] = Array(Array(3070811, 1963, 1096, 0, "US", "CA", 0, 1, 0), Array(3022811, 1963, 1096, 0, "US", "CA", 0, 1, 56), Array(3033811, 1963, 1096, 0, "US", "CA", 0, 1, 23))
scala> result.saveAsTextFile("data/result")
<console>:34: error: value saveAsTextFile is not a member of Array[Array[Any]]
              result.saveAsTextFile("data/result")

我也尝试过跟随它也失败了

scala> val output = result.map(x => (x(0),x(1),x(2),x(3), x(4), x(5), x(7), x(8)))
output: Array[(Any, Any, Any, Any, Any, Any, Any, Any)] = Array((3070811,1963,1096,0,"US","CA",1,0), (3022811,1963,1096,0,"US","CA",1,56), (3033811,1963,1096,0,"US","CA",1,23))

scala> output.saveAsTextFile("data/output")
<console>:36: error: value saveAsTextFile is not a member of Array[(Any, Any, Any, Any, Any, Any, Any, Any)]
              output.saveAsTextFile("data/output")

然后添加了以下内容并且也失败了

scala> output.mapValues(_.toList).saveAsTextFile("data/output")
<console>:36: error: value mapValues is not a member of Array[(Any, Any, Any, Any, Any, Any, Any, Any)]
              output.mapValues(_.toList).saveAsTextFile("data/output")

如何在控制台或结果文件中查看结果或输出变量的内容。缺少基本的东西。

更新1

每个Shankar Koirala我已经删除了.collect然后执行了保存。

scala> val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x))

这导致此输出

[Ljava.lang.Object;@7a1167b6
[Ljava.lang.Object;@60d86d2f
[Ljava.lang.Object;@20e85a55

更新1.a

选择更新后的答案并提供正确的数据

scala> val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x).mkString(","))
result: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[29] at map at <console>:31

scala> result.saveAsTextFile("data/mkstring")

结果

3070811,1963,1096,0,"US","CA",0,1,0
3022811,1963,1096,0,"US","CA",0,1,56
3033811,1963,1096,0,"US","CA",0,1,23

更新2

scala> val output = result.map(x => (x(0),x(1),x(2),x(3), x(4), x(5), x(7), x(8)))
output: org.apache.spark.rdd.RDD[(Any, Any, Any, Any, Any, Any, Any, Any)] = MapPartitionsRDD[27] at map at <console>:33

scala> output.saveAsTextFile("data/newOutPut")

我得到了这个结果

(3070811,1963,1096,0,"US","CA",1,0)
(3022811,1963,1096,0,"US","CA",1,56)
(3033811,1963,1096,0,"US","CA",1,23)

1 个答案:

答案 0 :(得分:2)

以下代码返回Array[Array[Any]]

val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x)).collect()

由于saveAsTextFile

中没有方法Array

它在RDD中可用,因此您不需要收集输出

val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x))

使用mkstring()转换为字符串并写入文件

val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x).mkString(","))

您还应该停止使用collect(),这会将所有数据带到驱动程序,如果数据很大,可能会导致内存问题。

希望这有帮助!