我的目标是在日志文件中将rdd与错误消息一起显示。 我正在读取日志文件并筛选与单词“ ERROR”匹配的行,我需要通过将其作为RDD来将错误消息写入数据库。
我是新来的人
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.text( "hdfs://10.90.3.78:9000/user/centuryuidt-3-1-1.out")
val patt: String = "ERROR"
val rdd=df.filter(line => line.contains(patt)).collect()
df.foreach(println)
执行此代码时出现以下异常。
<console>:40: error: value contains is not a member of org.apache.spark.sql.Row
val rdd=df.filter(line => line.contains(patt)).collect()
^
<console>:43: error: overloaded method value foreach with alternatives:
(func: org.apache.spark.api.java.function.ForeachFunction[org.apache.spark.sql.Row])Unit <and>
(f: org.apache.spark.sql.Row => Unit)Unit
cannot be applied to (Unit)
df.foreach(println)
^
屏幕截图:
添加少量更改
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val lines = sc.textFile( "hdfs://10.90.3.78:9000/user/centuryuidt-3-1-1.out")
val error = lines.filter(_.contains("ERROR"))
val df = error.toDF()
这对我有用,但是我需要用行来框住DF,它只给了我所有错误行在一行中。 谁能帮我把线分成几行??
答案 0 :(得分:0)
这是我完整的示例:
scala> errors.rdd
res7: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[13] at rdd at <console>:34
如果您确实需要将错误作为RDD,请注意,这是RDD [Row]:
scala> errors.map(_.getString(0)).rdd
res9: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[19] at rdd at <console>:34
如果您确实需要将错误作为RDD [String]:
@computedFrom()