根据正则表达式模式匹配scala匹配字符串

时间:2018-10-11 15:36:16

标签: regex scala pattern-matching

我写了以下正则表达式:

val reg = ".+([A-Z_].+).(\\d{4})_(\\d{2})_(\\d{2})_(\\d{2})\\.orc".r 

应该解析以下字符串: “ S3 //存储桶//TS11_YREDED.2018_09_28_02.orc” 解析方法是:

val dataExtraction: String => Map[String, String] = {
  string: String => {
    string match {
      case reg(filename, year, month, day) =>
                 Map(FILE_NAME-> filename, YEAR -> year, MONTH -> month, DAY -> day)
      case _  => Map(FILE_NAME-> filename,YEAR -> "", MONTH -> "", DAY -> "")
    }
  }
}
val YEAR = "YEAR"
val MONTH = "MONTH"
val DAY = "DAY"
val FILE_NAME = "FILE_NAME"

但无法正常工作 应该省略存储桶名称并解析文件名和日期

因此,预期输出应为:Map(FILE_NAME-> TS11_YREDED,YEAR->,MONTH-> 09,DAY-> 28) 知道如何解决它吗?

2 个答案:

答案 0 :(得分:0)

.+模式部分首先匹配整个字符串,而([A-Z_].+)仅捕获要由后续模式捕获和匹配的内容。

您可以使用

"""(?:.*/)?(.*)\.(\d{4})_(\d{2})_(\d{2})_\d{2}\.orc""".r

请参见this regex demo

请注意,点必须转义以匹配文字点。

详细信息

  • (?:.*/)?-尽可能多的除换行符以外的任何0+个字符,直到最后一个/并包括它
  • (.*)-捕获组1:尽可能多的除换行符以外的0+个字符
  • \.-一个点
  • (\d{4})-捕获第2组:四位数
  • _-下划线
  • (\d{2})-捕获组3:两位数字
  • _-下划线
  • (\d{2})-捕获第4组:两位数字
  • _\d{2}\.orc-_,两位数字,.orc在字符串的末尾。

Scala demo

val text = "S3//bucket//TS11_YREDED.2018_09_28_02.orc"
val reg = """(?:.*/)?(.*)\.(\d{4})_(\d{2})_(\d{2})_\d{2}\.orc""".r

var YEAR = "YEAR"
var MONTH = "MONTH"
var DAY = "DAY"
var FILE_NAME = "FILE_NAME"

val dataExtraction: String => Map[String, String] = {
  string: String => {
    string match {
      case reg(filename, year, month, day) =>
                 Map(FILE_NAME-> filename, YEAR -> year, MONTH -> month, DAY -> day)
      case _  => Map(FILE_NAME-> FILE_NAME,YEAR -> YEAR, MONTH -> MONTH, DAY -> DAY)
    }
  }
}

println(dataExtraction(text))
// => Map(FILE_NAME -> TS11_YREDED, YEAR -> 2018, MONTH -> 09, DAY -> 28)

由于您没有使用最后一个捕获组,因此可以从模式中将其省略。

答案 1 :(得分:0)

检查一下:

val file_name = "TS11_YREDED.2018_09_28_02.orc"
val reg = """(.*?)\.(\d{4})_(\d{2})_(\d{2})_(\d{2})\.orc""".r
var file_details = scala.collection.mutable.ArrayBuffer[String]()
reg.findAllIn(file_name).matchData.foreach( m => file_details.appendAll(m.subgroups))
val names=Array("FILE_NAME","YEAR","MONTH","DAY","DUMMY")
for( (x,y) <- names.zip(file_details).toMap)
  println(x + "->" + y)

//DUMMY->02
//DAY->28
//FILE_NAME->TS11_YREDED
//MONTH->09
//YEAR->2018