我有两个数据帧,第一个有两列,第二个有5个字段,我想比较第一个数据帧和第二个特殊列。如果它存在我做一个更新否则我插入两列这两个特殊领域。我是新人,我需要一些帮助才能继续感谢
我在这里做了什么
package Test
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object TMP_STRUCTURE extends App {
System.setProperty("hadoop.home.dir", "C:\\hadoop");
System.setProperty("spark.sql.warehouse.dir", "file:///C:/spark-warehouse");
val sparkSession = SparkSession.builder.master("local").appName("spark session example").getOrCreate()
//connect to table TMP_STRUCTURE oracle
val spark = sparkSession.sqlContext
val df = spark.load("jdbc",
Map("url" -> "jdbc:oracle:thin:IPTECH/IPTECH@//localhost:1521/XE",
"dbtable" -> "IPTECH.TMP_STRUCTURE"))
df.printSchema()
val article_groups = spark.load("jdbc", Map(
"url" -> "jdbc:postgresql://localhost:5432/gemodb?user=postgres&password=maher",
"dbtable" -> "article_groups"))
article_groups.printSchema()
}
root
|-- CODE: string (nullable = false)
|-- LIBELLE: string (nullable = false)
root
|-- id: long (nullable = false)
|-- is_enabled: boolean (nullable = true)
|-- code: string (nullable = true)
|-- name: string (nullable = true)
|-- number: string (nullable = true)
我想基于列代码和id
来编码代码和libelle以及id和name任何帮助,谢谢