我可以想到只使用withColumn():
val df = sc.dataFrame.withColumn('newcolname',{ lambda row: row + 1 } )
但我如何将其概括为文本数据?例如我的DataFrame有
strning values说“这是一个字符串的示例”,我想提取
在val arraystring中的第一个和最后一个字:Array [String] = Array(first,last)
答案 0 :(得分:2)
这是你正在寻找的东西吗?
val sc: SparkContext = ...
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val extractFirstWord = udf((sentence: String) => sentence.split(" ").head)
val extractLastWord = udf((sentence: String) => sentence.split(" ").reverse.head)
val sentences = sc.parallelize(Seq("This is an example", "And this is another one", "One_word", "")).toDF("sentence")
val splits = sentences
.withColumn("first_word", extractFirstWord(col("sentence")))
.withColumn("last_word", extractLastWord(col("sentence")))
splits.show()
然后输出是:
+--------------------+----------+---------+
| sentence|first_word|last_word|
+--------------------+----------+---------+
| This is an example| This| example|
|And this is anoth...| And| one|
| One_word| One_word| One_word|
| | | |
+--------------------+----------+---------+
答案 1 :(得分:1)