我需要将字符串拆分为数组,其中元素为scala后面的两个单词:
"Hello, it is useless text. Hope you can help me."
结果:
[[it is], [is useless], [useless text], [Hope you], [you can], [can help], [help me]]
又一个例子:
"This is example 2. Just\nskip it."
结果:
[[This is], [is example], [Just skip], [skip it]]
我试过这个正则表达式:
var num = """[a-zA-Z]+\s[a-zA-Z]+""".r
但输出是:
scala> for (m <- re.findAllIn("Hello, it is useless text. Hope you can help me.")) println(m)
it is
useless text
Hope you
can help
所以它忽略了一些情况。
答案 0 :(得分:1)
首先分割标点符号和数字,然后在空格上分割,然后滑过结果。
def doubleUp(txt :String) :Array[Array[String]] =
txt.split("[.,;:\\d]+")
.flatMap(_.trim.split("\\s+").sliding(2))
.filter(_.length > 1)
用法:
val txt1 = "Hello, it is useless text. Hope you can help me."
doubleUp(txt1)
//res0: Array[Array[String]] = Array(Array(it, is), Array(is, useless), Array(useless, text), Array(Hope, you), Array(you, can), Array(can, help), Array(help, me))
val txt2 = "This is example 2. Just\nskip it."
doubleUp(txt2)
//res1: Array[Array[String]] = Array(Array(This, is), Array(is, example), Array(Just, skip), Array(skip, it))
答案 1 :(得分:1)
首先通过删除所有转义字符来处理string
。
scala> val string = "Hello, it is useless text. Hope you can help me."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String = Hello, it is useless text. Hope you can help me.
OR
scala>val string = "This is example 2. Just\nskip it."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String =
//This is example 2. Just
//skip it.
然后过滤掉所有必要的字符(如字符,空格等......)并使用slide
函数
val result = preprocessed.split("\\s").filter(e => !e.isEmpty && !e.matches("(?<=^|\\s)[A-Za-z]+\\p{Punct}(?=\\s|$)") ).sliding(2).toList
//scala> res9: List[Array[String]] = List(Array(it, is), Array(is, useless), Array(useless, Hope), Array(Hope, you), Array(you, can), Array(can, help))
答案 2 :(得分:0)
您需要使用split
将字符串分解为由非单词字符分隔的单词,然后sliding
以您想要的方式将单词加倍;
val text = "Hello, it is useless text. Hope you can help me."
text.trim.split("\\W+").sliding(2)
您可能还想删除转义字符,如其他答案中所述。
答案 3 :(得分:-1)
抱歉,我只懂Python。我听说两者差不多了。希望你能理解
string = "it is useless text. Hope you can help me."
split = string.split(' ') // splits on space (you can use regex for this)
result = []
no = 0
count = len(split)
for x in range(count):
no +=1
if no < count:
pair = split[x] + ' ' + split[no] // Adds the current to the next
result.append(pair)
输出将是:
['it is', 'is useless', 'useless text.', 'text. Hope', 'Hope you', 'you can', 'can help', 'help me.']