使用正则表达式,如何在不忽略其他符号(例如逗号,数字等)的情况下仅检索单词?
val words = text.split("\b([-A-Za-z])+\b")
例如:
This is a nice day, my name is...
我想得到:
This, is, a, nice, day, my, name, is
而忽略,
和...
。
答案 0 :(得分:2)
将字符串分割成非字母:
val words = text.split("[^-A-Za-z]+")
答案 1 :(得分:2)
要提取包括连字符的所有单词,您可以使用
"""\b[a-zA-Z]+(?:-[a-zA-Z]+)*\b""".r.findAllIn(s)
要支持所有Unicode字母,请使用\p{L}
而不是[a-zA-Z]
字符类:
val s = "This is a nice day, my name is..."
val res = """\b\p{L}+(?:-\p{L}+)*\b""".r.findAllIn(s)
println(res.toList)
// => List(This, is, a, nice, day, my, name, is)
请参见Scala demo。
答案 2 :(得分:0)
val p ="""[[a-z][A-Z]]+""".r
在REPL中:
scala> val text = "This is a nice day, my name is..."
text: String = This is a nice day, my name is...
scala> p.findAllIn(text).toArray
res24: Array[String] = Array(This, is, a, nice, day, my, name, is)
scala> val text = "This is a nice_day, my_name is..."
text: String = This is a nice_day, my_name is...
scala> p.findAllIn(text).toArray
res26: Array[String] = Array(This, is, a, nice, day, my, name, is)