我有专栏:
val originalSqlLikePatternMap = Map("item (%) is blacklisted%" -> "BLACK_LIST",
"%Testing%" -> "TESTING",
"%purchase count % is too low %" -> "TOO_LOW_PURCHASE_COUNT")
val javaPatternMap = originalSqlLikePatternMap.map(v => v._1.replaceAll("%", ".*") -> v._2)
val df = Seq(
"Testing(2,4, (4,6,7) foo, Foo purchase count 1 is too low",
"Foo purchase count (12, 4) is too low ", "#!@", "item (mejwnw) is blacklisted",
"item (1) is blacklisted, #!@"
).toDF("raw_type")
val converter = (value: String) => javaPatternMap.find(v => value.matches(v._1)).map(_._2).getOrElse("Unknown")
val converterUDF = udf(converter)
val result = df.withColumn("updatedType", converterUDF($"raw_type"))
但是它给出了:
+---------------------------------------------------------+----------------------+
|raw_type |updatedType |
+---------------------------------------------------------+----------------------+
|Testing(2,4, (4,6,7) foo, Foo purchase count 1 is too low|TESTING |
|Foo purchase count (12, 4) is too low |TOO_LOW_PURCHASE_COUNT|
|#!@ |Unknown |
|item (mejwnw) is blacklisted |BLACK_LIST |
|item (1) is blacklisted, #!@ |BLACK_LIST |
+---------------------------------------------------------+----------------------+
但是我想要“ Testing(2,4,(4,6,7)foo,Foo购买计数1太低”)以给出2个值“ TESTING,TOO_LOW_PURCHASE_COUNT”:
+---------------------------------------------------------+--------------------------------+
|raw_type |updatedType |
+---------------------------------------------------------+--------------------------------+
|Testing(2,4, (4,6,7) foo, Foo purchase count 1 is too low|TESTING, TOO_LOW_PURCHASE_COUNT |
|Foo purchase count (12, 4) is too low |TOO_LOW_PURCHASE_COUNT |
|#!@ |Unknown |
|item (mejwnw) is blacklisted |BLACK_LIST |
|item (1) is blacklisted, #!@ |BLACK_LIST, Unkown |
+---------------------------------------------------------+--------------------------------+
有人可以告诉我我在做什么错吗?
答案 0 :(得分:2)
好的。所以,这里有几件事,
关于datesArray.forEach((date, index) =>{
let dateYear = dates.split('/')[0];
calendarObject[dateYear].push(dates);
});
,您需要针对每个正则表达式检查每个find
以获得所需的输出,因此find是不正确的选择。
迭代器产生的第一个满足谓词的值,如果 任何。
请注意正则表达式,低位后要留一个空格,这就是为什么它不匹配的原因。也许您应该重新考虑将Row
也替换为%
,
.*
因此,随着更改,您的代码将类似于
%purchase count % is too low %
输出
val originalSqlLikePatternMap = Map(
"item (%) is blacklisted%" -> "BLACK_LIST",
"%Testing%" -> "TESTING",
"%purchase count % is too low%" -> "TOO_LOW_PURCHASE_COUNT")
val javaPatternMap = originalSqlLikePatternMap.map(v => v._1.replaceAll("%", ".*").r -> v._2)
val df = Seq(
"Testing(2,4, (4,6,7) foo, Foo purchase count 1 is too low",
"Foo purchase count (12, 4) is too low ", "#!@", "item (mejwnw) is blacklisted",
"item (1) is blacklisted, #!@"
).toDF("raw_type")
val converter = (value: String) => {
val res = javaPatternMap.map(v => {
v._1.findFirstIn(value) match {
case Some(_) => v._2
case None => ""
}
})
.filter(_.nonEmpty).mkString(", ")
if (res.isEmpty) "Unknown" else res
}
val converterUDF = udf(converter)
val result = df.withColumn("updatedType", converterUDF($"raw_type"))
result.show(false)
希望这会有所帮助!