为什么我会遇到匹配错误?

时间:2017-12-11 10:03:19

标签: regex scala split pattern-matching

我有一些sxt插入语句,如:

insert into songlist (id, artist, title, numone) values (6606, 'TIMI YURO', 'HURT', 0);
insert into songlist (id, artist, title, numone) values (6607, 'TIMI YURO', 'WHAT*S A MATTER BABY', 0);
insert into songlist (id, artist, title, numone) values (6608, 'TIMI YURO', 'MAKE THE WORLD GO AWAY', 0);
insert into songlist (id, artist, title, numone) values (6609, 'HELMUT ZACHARIAS', 'WHEN THE WHITE LILACS BLOOM AGAIN', 0);
insert into songlist (id, artist, title, numone) values (6610, 'JOHN *THE COOL GHOUL* ZACHERLE', 'DINNER WITH DRAC', 0);
insert into songlist (id, artist, title, numone) values (6611, 'MICHAEL ZAGER BAND', 'LET*S ALL CHANT', 0);
insert into songlist (id, artist, title, numone) values (6612, 'ZAGER AND EVANS', 'IN THE YEAR 2525 (EXORDIUM AND TERMINUS)', 1);
insert into songlist (id, artist, title, numone) values (6613, 'RICKY ZAHND / BLUEJEANERS', 'NUTTIN* FOR CHRISTMAS', 0);
insert into songlist (id, artist, title, numone) values (6614, 'WARREN ZEVON', 'WEREWOLVES OF LONDON', 0);
insert into songlist (id, artist, title, numone) values (6615, 'ZOMBIES', 'SHE*S NOT THERE', 0);

我以下列方式阅读它们:

val dt_split = bufferedsr.getLines.mkString.split(Pattern.quote("insert into songlist (id, artist, title, numone)"))    


val dt_pt = raw"values \((\d+), '(.*)', '(.*)', (\d+)\);".r

val tmp =  dt_split.map( elem => elem.mkString match {
    case dt_pt (id,artist,title,numone) => (id.toInt, artist, title, numone.toInt) 
  } )

错误:scala.MatchError: (of class java.lang.String) 可以找到完整的详细错误here.

请注意val dt_split = bufferedsr.getLines.mkString.split(Pattern.quote("insert into songlist (id, artist, title, numone)")).toList给出了

 values (6606, 'TIMI YURO', 'HURT', 0);
 values (6607, 'TIMI YURO', 'WHAT*S A MATTER BABY', 0);
 values (6608, 'TIMI YURO', 'MAKE THE WORLD GO AWAY', 0);
 values (6609, 'HELMUT ZACHARIAS', 'WHEN THE WHITE LILACS BLOOM AGAIN', 0);
 values (6610, 'JOHN *THE COOL GHOUL* ZACHERLE', 'DINNER WITH DRAC', 0);
 values (6611, 'MICHAEL ZAGER BAND', 'LET*S ALL CHANT', 0);
 values (6612, 'ZAGER AND EVANS', 'IN THE YEAR 2525 (EXORDIUM AND TERMINUS)', 1);
 values (6613, 'RICKY ZAHND / BLUEJEANERS', 'NUTTIN* FOR CHRISTMAS', 0);
 values (6614, 'WARREN ZEVON', 'WEREWOLVES OF LONDON', 0);
 values (6615, 'ZOMBIES', 'SHE*S NOT THERE', 0); 

我错过了什么?

2 个答案:

答案 0 :(得分:0)

此错误的主要原因是您要拆分的文本以模式开头,因此第一个结果将是一个空字符串:

scala> "abcd values".split(Pattern.quote("abcd"))
res1: Array[String] = Array("", " values")

更好的方法是使用stripPrefix代替:

bufferedsr.getLines.map(_.stripPrefix("insert into songlist (id, artist, title, numone)"))

这会生成Iterator,但如果您愿意,可以将其转换为Seq

此处的另一个错误是您的拆分模式和正则表达式模式之间缺少空格字符。您可以将此空格添加到您剥离的前缀:

bufferedsr.getLines.map(_.stripPrefix("insert into songlist (id, artist, title, numone) "))

此外,源文件中可能有空行,尤其是最后一行,因此您可能还必须过滤dt_split

完整的实现可能如下所示:

val dt_split = bufferedsr
  .getLines
  .map(_.stripPrefix("insert into songlist (id, artist, title, numone) "))
  .filter(_.nonEmpty)
  .toSeq

val dt_pt = raw"values \((\d+), '(.*)', '(.*)', (\d+)\);".r

val tmp =  dt_split.map( elem => elem.mkString match {
    case dt_pt (id,artist,title,numone) => (id.toInt, artist, title, numone.toInt) 
  } )

答案 1 :(得分:0)

不确定您要执行的操作,但可以使用以下代码从文件中提取所需的匹配项

val linesIterator = Source.fromFile("your_file_path").getLines

val regexPattern = raw".* values \((\d+), '(.*)', '(.*)', (\d+)\);".r

val tupleIterator = linesIterator.flatMap(line => line match {
  case regexPattern(id, artist, title, numone) => Some((id, artist, title, numone))
  case _ => None
})

val tupleList = tupleIterator.toList

tupleList.foreach(println)
// (6606,TIMI YURO,HURT,0)
// (6607,TIMI YURO,WHAT*S A MATTER BABY,0)
// (6608,TIMI YURO,MAKE THE WORLD GO AWAY,0)
// (6609,HELMUT ZACHARIAS,WHEN THE WHITE LILACS BLOOM AGAIN,0)
// (6610,JOHN *THE COOL GHOUL* ZACHERLE,DINNER WITH DRAC,0)
// (6611,MICHAEL ZAGER BAND,LET*S ALL CHANT,0)
// (6612,ZAGER AND EVANS,IN THE YEAR 2525 (EXORDIUM AND TERMINUS),1)
// (6613,RICKY ZAHND / BLUEJEANERS,NUTTIN* FOR CHRISTMAS,0)
// (6614,WARREN ZEVON,WEREWOLVES OF LONDON,0)
// (6615,ZOMBIES,SHE*S NOT THERE,0)