通过在Java Spark中的现有列上进行刻写,在数据框中创建新列

时间:2019-12-17 18:13:57

标签: java apache-spark apache-spark-sql

我有一个包含以下数据的CSV文件。

+---+----------+-----------+--------------------+------+------------+
| id|first_name|  last_name|               email|gender|phone_number|
+---+----------+-----------+--------------------+------+------------+
|  1|      Fitz|        Eim|    feim0@unesco.org|  Male|348-553-7982|
|  2|     Cynde|    Worters| cworters1@github.io|Female|682-689-5203|
|  3|    Sandye|     Jaksic|  sjaksic2@google.es|Female|896-867-0193|
|  4|     Viole|    Ritzman|vritzman3@meetup.com|Female|195-375-0157|
|  5|       Ira| Blackaller|iblackaller4@soup.io|  Male|895-872-8224|
+---+----------+-----------+--------------------+------+------------+

我想填充一个名为 email_encrypted 的新列,其中将包含来自相应电子邮件列的加密数据。我正在使用JAVA Spring TextEncryptor,它接受字符串值。输出应该看起来像这样,但是对电子邮件列进行了加密。

+---+----------+-----------+--------------------+------+------------+--------------------+
| id|first_name|  last_name|               email|gender|phone_number|     email_encrypted|
+---+----------+-----------+--------------------+------+------------+--------------------+
|  1|      Fitz|        Eim|    feim0@unesco.org|  Male|348-553-7982|    feim0@unesco.org|
|  2|     Cynde|    Worters| cworters1@github.io|Female|682-689-5203| cworters1@github.io|
|  3|    Sandye|     Jaksic|  sjaksic2@google.es|Female|896-867-0193|  sjaksic2@google.es|
|  4|     Viole|    Ritzman|vritzman3@meetup.com|Female|195-375-0157|vritzman3@meetup.com|
|  5|       Ira| Blackaller|iblackaller4@soup.io|  Male|895-872-8224|iblackaller4@soup.io|
+---+----------+-----------+--------------------+------+------------+--------------------+

我无法遍历每个值,任何有效的解决方案将不胜感激。这是我正在使用的一段代码,但是没有成功。

SparkSession spark = SparkSession.builder().appName("Simple Application").master("local").getOrCreate();

Dataset<Row> data = spark.read().format("csv").option("header","true").load("/Users/amansaurav/Documents/Workspace/com_falabella_encrypt_cross_bu-master/src/main/resources/MOCK_DATA.csv");
Dataset<Row> dataNew=data.withColumn("email_encrypted", lit(Encryptor.encrypt(data.col("email"))));

0 个答案:

没有答案