我有一个Map,其中包含RDBMS数据类型作为键,而Hive数据类型作为值。
var dataMap:Map[String, String] = dataMapper
for((k,v) <- dataMap) {
println(k + "->"+ v)
}
输出:
character varying->string
character\([0-9]{1,3}\)->string
timestamp without time zone->timestamp
name->string
timestamp\([0-9]{1,3}\) without time zone->timestamp
timestamp with time zone->timestamp
timestamp->timestamp
real->double
character varying\([0-9]{1,4}\)->string
numeric\([0-9]{1,3},[1-9][0-9]{0,2}\)->double
smallint->int
timestamp\([0-9]{1,3}\) with time zone->timestamp
timestamp\([0-9]{1,3}\)->timestamp
unknown->string
text->string
time without time zone->timestamp
bpchar->string
date->date
character->string
numeric->double
numeric\([0-9]{1,3},0\)->bigint
integer->int
bigint->bigint
time with time zone->timestamp
double precision->double
有一个包含列名及其数据类型的列表(数据类型为GreenPlum数据库(RDBMS),如下所示:
*Column Name Datatype*
forecast_id bigint
period_year numeric(15,0)
period_name character varying(15)
org character varying(10)
ledger_id bigint
currency_code character varying(15)
source_system_name character varying(30)
db_source_system_name character varying(30)
year character varying(256)
ptd_balance numeric
xx_creation_tms timestamp without time zone
xx_last_update_log_id integer
xx_data_hash_code character varying(32)
xx_pk_id bigint
我需要通过检查映射:dataMap
是否包含数据类型作为键来更改列的数据类型(如果存在),然后获取它的值并将其与列名一起放入。当我执行以下代码时:
class ChangeDataTypes(var gpColumnDetails: List[String], var dataMapper:Map[String, String]) {
var recGpDet:ListBuffer[String] = gpColumnDetails.to[ListBuffer]
var dataMap:Map[String, String] = dataMapper
def gpDetails(): Unit = {
val schemaString:List[String] = recGpDet.map(s => s.split(":")).map(s => s(0) + " " + dMap(s(1))).toList
for(i <- schemaString) {
println(i)
}
}
def dMap(rdbmsColDataType: String): String ={
var hiveDataType:String=null
if(dataMap.keysIterator.contains(rdbmsColDataType)) {
dataMap(rdbmsColDataType)
}
hiveDataType
}
}
运行代码时,得到以下输出:
forecast_id bigint
period_year null
period_name null
org null
ledger_id bigint
currency_code null
source_system_name null
db_source_system_name null
year null
ptd_balance double
xx_creation_tms timestamp
xx_last_update_log_id int
xx_data_hash_code null
xx_pk_id null
输出中正确的值是由于Map中存在确切的键String。由于以下键,我得到了null
值:character varying\([0-9]{1,4}\), numeric\([0-9]{1,3},[1-9][0-9]{0,2}\), numeric\([0-9]{1,3},0\)
等。
任何人都可以让我知道如何编写一个条件来查找dataMap中的所有键
答案 0 :(得分:1)
要通过dataMap
中的键查找值,首先需要将Greenplum数据类型映射到dataMap
中的键格式。可以通过Regex
将每个Greenplum数据类型与dataMap
键进行匹配来完成,如下例所示(仅组装了dataMap的一个子集):
val dataMap: Map[String, String] = Map(
"character varying" -> "string",
"character\\([0-9]{1,3}\\)" -> "string",
"character varying\\([0-9]{1,4}\\)" -> "string",
"timestamp without time zone" -> "timestamp",
"timestamp" -> "timestamp",
"numeric" -> "double",
"numeric\\([0-9]{1,3},0\\)" -> "bigint",
"integer" -> "int",
"bigint" -> "bigint"
)
val gpSchema: List[String] = List(
"forecast_id: bigint",
"period_year: numeric(15,0)",
"period_name: character varying(15)",
"org: character varying(10)",
"ledger_id: bigint",
"currency_code: character varying(15)",
"source_system_name: character varying(30)",
"db_source_system_name: character varying(30)",
"year: character varying(256)",
"ptd_balance: numeric",
"xx_creation_tms: timestamp without time zone",
"xx_last_update_log_id: integer",
"xx_data_hash_code: character varying(32)",
"xx_pk_id: bigint"
)
val patterns = dataMap.keySet
gpSchema.
map( _.split(":\\s*") match { case Array(x: String, y: String) => (x, y) } ).
map{ case (k, v) =>
val vkey = patterns.dropWhile{ p => v != p.r.findFirstIn(v).getOrElse("") }.
headOption match {
case Some(p) => p
case None => ""
}
(k, dataMap.getOrElse(vkey, "n/a"))
}
// res1: List[(String, String)] = List(
// (forecast_id,bigint), (period_year,bigint), (period_name,string), (org,string),
// (ledger_id,bigint), (currency_code,string), (source_system_name,string),
// (db_source_system_name,string), (year,string), (ptd_balance,double),
// (xx_creation_tms,timestamp), (xx_last_update_log_id,int), (xx_data_hash_code,string),
// (xx_pk_id,bigint)
// )
要使上述模式匹配适合您的现有代码,可以对ChangeDataTypes
类进行如下修改:
class ChangeDataTypes(val gpColumnDetails: List[String], val dataMap: Map[String, String]) {
def gpDetails(): Unit =
gpColumnDetails.map(_.split(":\\s*")).map(s => s(0) + "\t" + dMap(s(1))).toList.
foreach(println)
def dMap(gpColType: String): String = {
val patterns = dataMap.keySet
val mkey = patterns.dropWhile{
p => gpColType != p.r.findFirstIn(gpColType).getOrElse("")
}.
headOption match {
case Some(p) => p
case None => ""
}
dataMap.getOrElse(mkey, "n/a")
}
}