如何使用正则表达式仅检索单词

时间:2018-09-29 09:20:54

标签: regex scala

使用正则表达式,如何在不忽略其他符号(例如逗号,数字等)的情况下仅检索单词?

val words = text.split("\b([-A-Za-z])+\b")

例如:

This is a nice day, my name is...

我想得到:

This, is, a, nice, day, my, name, is

而忽略,...

3 个答案:

答案 0 :(得分:2)

将字符串分割成非字母:

val words = text.split("[^-A-Za-z]+")

答案 1 :(得分:2)

要提取包括连字符的所有单词,您可以使用

"""\b[a-zA-Z]+(?:-[a-zA-Z]+)*\b""".r.findAllIn(s)

要支持所有Unicode字母,请使用\p{L}而不是[a-zA-Z]字符类:

val s = "This is a nice day, my name is..."
val res = """\b\p{L}+(?:-\p{L}+)*\b""".r.findAllIn(s)
println(res.toList)
// => List(This, is, a, nice, day, my, name, is)

请参见Scala demo

答案 2 :(得分:0)

val p ="""[[a-z][A-Z]]+""".r

在REPL中:

scala> val text = "This is a nice day, my name is..."
text: String = This is a nice day, my name is...

scala> p.findAllIn(text).toArray
res24: Array[String] = Array(This, is, a, nice, day, my, name, is)

scala> val text = "This is a nice_day, my_name is..."
text: String = This is a nice_day, my_name is...

scala> p.findAllIn(text).toArray
res26: Array[String] = Array(This, is, a, nice, day, my, name, is)