如何在scala和java中反序列化json

时间:2018-07-25 08:18:50

标签: java json scala apache-spark

我是scala的新手。我需要连接到数据库并从表“ queue”中选择一个名为“ queue_message”的列。此列包含一个json模式:

{"LOG_ID":"2442204","CUSTOMER_CODE":"79D3QL","CFILE_WEIGHT":"1","PROVIDER_ID":"","FILETYPE_DIRECTORYFROM":"\\FromCustomer","FILE_CHARSET":"","CFILE_FORMAT":"CSV","FILE_NAME":"1475_18032018T164840_1.csv","FILETYPE_LABEL":"Order","FILE_ID":1475,"FILEFORMAT_CODE":"","CUSTOMER_ID":1016,"FILE_MASK":"wt_cde_*-*_*.csv"}

我需要在scala中(或在Java中作为第二个选项)反序列化此列,然后将另一个结构序列化为json格式。

这是我在scala中的代码:

package com.orienit.spark.training.sparkexamples

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import java.sql.DriverManager
import com.microsoft.sqlserver.jdbc
import org.apache.spark.rdd.JdbcRDD
import java.sql.ResultSet




object WordCount {
  def main(args: Array[String]){

val conf = new SparkConf()
 .setAppName("my first scala App")
 .setMaster("local")

val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val url =  "jdbc:sqlserver://localhost:1433;user=xsxx;password=xxx;databaseName=xxx"
val df = sqlContext


.read
   .format("jdbc")
   .option("url",url)
   .option("dbtable","(select top 1 queue_message from mq..queue where queuename_id = 4 order by queue_id desc) as sq")
   .load()

   df.show()
    println( df.collectAsList())
     }
}

这些是我在scala项目的maven pom.xml中使用的依赖项:

<dependencies>


<dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.11.0</version>
</dependency>
 <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.3.1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.3.0</version>
</dependency>

<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>3.8.1</version>
  <scope>test</scope>
</dependency>

这是我在Java中的代码:

package com.orienit.spark.training.javaJdbcConnectivity;

import java.util.HashMap;
import java.util.Map;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;

public class WordCount {

public static void main(String[] args) {
    // TODO Auto-generated method stub
    SparkConf conf = new SparkConf().setMaster("local").setAppName("My app");
    JavaSparkContext sc  = new JavaSparkContext(conf);

    SQLContext sqlContext = new SQLContext(sc);

    Map<String, String> options = new HashMap<String, String>();




   options.put("url", "jdbc:sqlserver://localhost:1433;user=xsxx;password=xxx;databaseName=xxx");
options.put("dbtable", "(select top 1 queue_message from mq..queue where queuename_id = 4 order by queue_id desc) as sq");

Dataset<Row> df = sqlContext.read().format("jdbc"). options(options).load();
df.show();
System.out.println(df.collectAsList());
System.out.println(df.toJSON());


    }

}

这些是我的Java项目的依赖项

    <dependencies>
  <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

谁能帮助我,请序列化为json格式并从json格式反序列化,或者给我有关此主题的任何相关文档。我没有在Spark官方文档中找到任何对这次行动之王有用的东西。

非常感谢

1 个答案:

答案 0 :(得分:0)

您可以在spark中使用from_json函数。

假设json的模式为“ schema”,那么您可以简单地执行以下操作:

import org.apache.spark.sql.functions.from_json
df.withColumn("deserialized", from_json($"queue_message", schema)