根据不同的关键字格式在Scala地图中搜索关键字

时间:2018-08-04 13:26:13

标签: scala apache-spark

我有一个Map,其中包含RDBMS数据类型作为键,而Hive数据类型作为值。

var dataMap:Map[String, String] = dataMapper
for((k,v) <- dataMap) {
   println(k + "->"+ v)
}

输出:

character varying->string
character\([0-9]{1,3}\)->string
timestamp without time zone->timestamp
name->string
timestamp\([0-9]{1,3}\) without time zone->timestamp
timestamp with time zone->timestamp
timestamp->timestamp
real->double
character varying\([0-9]{1,4}\)->string
numeric\([0-9]{1,3},[1-9][0-9]{0,2}\)->double
smallint->int
timestamp\([0-9]{1,3}\) with time zone->timestamp
timestamp\([0-9]{1,3}\)->timestamp
unknown->string
text->string
time without time zone->timestamp
bpchar->string
date->date
character->string
numeric->double
numeric\([0-9]{1,3},0\)->bigint
integer->int
bigint->bigint
time with time zone->timestamp
double precision->double

有一个包含列名及其数据类型的列表(数据类型为GreenPlum数据库(RDBMS),如下所示:

*Column Name             Datatype*
forecast_id             bigint
period_year             numeric(15,0)
period_name             character varying(15)
org                     character varying(10)
ledger_id               bigint
currency_code           character varying(15)
source_system_name      character varying(30)
db_source_system_name   character varying(30)
year                    character varying(256)
ptd_balance             numeric
xx_creation_tms         timestamp without time zone
xx_last_update_log_id   integer
xx_data_hash_code       character varying(32)
xx_pk_id                bigint

我需要通过检查映射:dataMap是否包含数据类型作为键来更改列的数据类型(如果存在),然后获取它的值并将其与列名一起放入。当我执行以下代码时:

class ChangeDataTypes(var gpColumnDetails: List[String], var dataMapper:Map[String, String]) {
  var recGpDet:ListBuffer[String] = gpColumnDetails.to[ListBuffer]
  var dataMap:Map[String, String] = dataMapper
  def gpDetails(): Unit = {
    val schemaString:List[String] = recGpDet.map(s => s.split(":")).map(s => s(0) + " " + dMap(s(1))).toList
    for(i <- schemaString) {
      println(i)
    }
  }
  def dMap(rdbmsColDataType: String): String ={
    var hiveDataType:String=null
    if(dataMap.keysIterator.contains(rdbmsColDataType)) {
      dataMap(rdbmsColDataType)
    }
    hiveDataType
  }
}

运行代码时,得到以下输出:

forecast_id             bigint
period_year             null
period_name             null
org                     null
ledger_id               bigint
currency_code           null
source_system_name      null
db_source_system_name   null
year                    null
ptd_balance             double
xx_creation_tms         timestamp
xx_last_update_log_id   int
xx_data_hash_code       null
xx_pk_id                null

输出中正确的值是由于Map中存在确切的键String。由于以下键,我得到了null值:character varying\([0-9]{1,4}\), numeric\([0-9]{1,3},[1-9][0-9]{0,2}\), numeric\([0-9]{1,3},0\)等。 任何人都可以让我知道如何编写一个条件来查找dataMap中的所有键

1 个答案:

答案 0 :(得分:1)

要通过dataMap中的键查找值,首先需要将Greenplum数据类型映射到dataMap中的键格式。可以通过Regex将每个Greenplum数据类型与dataMap键进行匹配来完成,如下例所示(仅组装了dataMap的一个子集):

val dataMap: Map[String, String] = Map(
  "character varying" -> "string",
  "character\\([0-9]{1,3}\\)" -> "string",
  "character varying\\([0-9]{1,4}\\)" -> "string",
  "timestamp without time zone" -> "timestamp",
  "timestamp" -> "timestamp",
  "numeric" -> "double",
  "numeric\\([0-9]{1,3},0\\)" -> "bigint",
  "integer" -> "int",
  "bigint" -> "bigint"
)

val gpSchema: List[String] = List(
  "forecast_id: bigint",
  "period_year: numeric(15,0)",
  "period_name: character varying(15)",
  "org: character varying(10)",
  "ledger_id: bigint",
  "currency_code: character varying(15)",
  "source_system_name: character varying(30)",
  "db_source_system_name: character varying(30)",
  "year: character varying(256)",
  "ptd_balance: numeric",
  "xx_creation_tms: timestamp without time zone",
  "xx_last_update_log_id: integer",
  "xx_data_hash_code: character varying(32)",
  "xx_pk_id: bigint"
)

val patterns = dataMap.keySet

gpSchema.
  map( _.split(":\\s*") match { case Array(x: String, y: String) => (x, y) } ).
  map{ case (k, v) =>
    val vkey = patterns.dropWhile{ p => v != p.r.findFirstIn(v).getOrElse("") }.
      headOption match {
        case Some(p) => p
        case None => ""
      }

    (k, dataMap.getOrElse(vkey, "n/a"))
  }

// res1: List[(String, String)] = List(
//   (forecast_id,bigint), (period_year,bigint), (period_name,string), (org,string),
//   (ledger_id,bigint), (currency_code,string), (source_system_name,string),
//   (db_source_system_name,string), (year,string), (ptd_balance,double),
//   (xx_creation_tms,timestamp), (xx_last_update_log_id,int), (xx_data_hash_code,string),
//   (xx_pk_id,bigint)
// )

要使上述模式匹配适合您的现有代码,可以对ChangeDataTypes类进行如下修改:

class ChangeDataTypes(val gpColumnDetails: List[String], val dataMap: Map[String, String]) {
  def gpDetails(): Unit =
    gpColumnDetails.map(_.split(":\\s*")).map(s => s(0) + "\t" + dMap(s(1))).toList.
      foreach(println)

  def dMap(gpColType: String): String = {
    val patterns = dataMap.keySet
    val mkey = patterns.dropWhile{
        p => gpColType != p.r.findFirstIn(gpColType).getOrElse("")
      }.
      headOption match {
        case Some(p) => p
        case None => ""
      }
    dataMap.getOrElse(mkey, "n/a")
  }
}