如何使用隐式编码器数据集

时间:2019-07-02 16:42:10

标签: apache-spark apache-kafka spark-streaming spark-structured-streaming

我正在将应用程序从Sparkstreaming切换到结构化流媒体。它是一个应用程序,可从kafka主题中读取日志以进行解析并将其保存到cassandra。

C:x\x\\CassandraHelper.scala:425:122: Unable to find encoder for type com.xx.dtl.business.cassandra.ConnectionCassDto. An implicit Encoder[com.xx.dtl.business.cassandra.ConnectionCassDto] is needed to store com.xx.dtl.business.cassandra.ConnectionCassDto instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.[error] val successConnectionDS= connectionDS.filter(x => x.libelleOperation.equals(xx_SUCCESSFULL_IDENTIFICATION)).flatMap(connection => mapToDto(connection)) 这是我得到错误的地方:

def persisteConnection(connectionDS: Dataset[Connection]): Unit = { val successConnectionDS= connectionDS.filter(x => x.libelleOperation.equals(ATOS_SUCCESSFULL_IDENTIFICATION)).flatMap(connection => mapToDto(connection)) val faliedConnectionDS = connectionDS.filter(x => x.libelleOperation.equals(ATOS_FAILURE_IDENTIFICATION)).flatMap(connection => mapToDto(connection)) successConnectionDS.saveToCassandra(AppConf.CassandraReferentielValorizationKeySpace, "connexion_reussie", SomeColumns( "identifiant_web", "date_connexion", "code_pays", "coords", "city_name", "region_name", "isp", "asn", "id_personne", "id_dim_temps", "ip", "pays", "session_id", "client_media_id", "brs_session_id")) faliedConnectionDS.saveToCassandra(AppConf.CassandraReferentielValorizationKeySpace, "connexion_echouee", SomeColumns( "identifiant_web", "date_connexion", "code_pays", "coords", "city_name", "region_name", "isp", "asn", "id_personne", "id_dim_temps", "ip", "pays", "session_id", "client_media_id", "brs_session_id")) }

def mapToDto(connection: Connection): Option[ConnectionCassDto] = {
Some(new ConnectionCassDto(
  connection.id_web,
  connection.id_dim_temps,
  connection.timestamp,
  connection.contact_id,
  EmptyStringField,
  connection.code_pays,
  connection.coords.mkString(", "),
  connection.city_name,
  connection.region_name,
  connection.isp,
  connection.asn,
  connection.ip,
  connection.sessionID,
  connection.client_media_id,
  connection.brsSessionId))

} 基本上,我用数据集和我从kafka读取的方式更改了所有DSTREAM。 在解析步骤中,我没有做任何更改。像处理DSTREAM一样处理数据集。

有线索吗?

0 个答案:

没有答案