通过scala中的正则表达式提取字母数字字符串的特定部分

时间:2018-08-02 12:39:58

标签: regex string scala

Scala新手 我有一个文本文件,其行类似-

HP,20180720
UPE,20180720
MP,20180720

,依此类推。 在我的scala程序中,我将模式捕获为:

val pattern = "([A-Z]{2}[A-Z]?]),([0-9]{4})([0-9]{2})([0-9]{2})".r
val pattern(circle,year,month,day) = line

这是文本文件的迭代器,其中每次迭代是文件中的一行,例如-MP,20180720

现在,在REPL中,我可以看到变量模式具有所需的值,但是如何解压缩它们,访问它们或将其存储在单独的变量中?

根据本文:https://alvinalexander.com/scala/how-to-extract-parts-strings-match-regular-expression-regex-scala

1 个答案:

答案 0 :(得分:1)

评论中提到的Andrey Tyukin是正确的。如果我们删除多余的']'(如下面的示例所示),并在开头放置一个额外的','作为{2,},以匹配','之前的2个或更多字符,则它起作用:< / p>

scala> val pattern = "([A-Z]{2,}[A-Z]?),([0-9]{4})([0-9]{2})([0-9]{2})".r
pattern: scala.util.matching.Regex = ([A-Z]{2}[A-Z]?),([0-9]{4})([0-9]{2})([0-9]{2})

scala> val pattern(circle,year,month,day)="UPE,20180720"
circle: String = UPE
year: String = 2018
month: String = 07
day: String = 20

如果只想访问月份,则可以这样使用:

scala> val pattern(_,_,month,_)="UPE,20180720"
month: String = 07

甚至可以将这种模式简化为:

val pattern = """([A-Z]{2,}),(\d{4})(\d{2})(\d{2})""".r

scala> val pattern = """([A-Z]{2,}),(\d{4})(\d{2})(\d{2})""".r
pattern: scala.util.matching.Regex = ([A-Z]{2,}),(\d{4})(\d{2})(\d{2})

scala> val pattern(circle,year,month,day)="UPE,20180720"
circle: String = UPE
year: String = 2018
month: String = 07
day: String = 20

scala> val pattern(_,_,month,_)="UPE,20180720"
month: String = 07