Question

我需要将字符串拆分为数组，其中元素为scala后面的两个单词：

"Hello, it is useless text. Hope you can help me."

结果：

[[it is], [is useless], [useless text], [Hope you], [you can], [can help], [help me]]

又一个例子：

"This is example 2. Just\nskip it."

结果： [[This is], [is example], [Just skip], [skip it]]

我试过这个正则表达式：

var num = """[a-zA-Z]+\s[a-zA-Z]+""".r

但输出是：

scala> for (m <- re.findAllIn("Hello, it is useless text. Hope you can help me.")) println(m)
it is
useless text
Hope you
can help

所以它忽略了一些情况。

Answer 1

首先分割标点符号和数字，然后在空格上分割，然后滑过结果。

def doubleUp(txt :String) :Array[Array[String]] =
  txt.split("[.,;:\\d]+")
     .flatMap(_.trim.split("\\s+").sliding(2))
     .filter(_.length > 1)

用法：

val txt1 = "Hello, it is useless text. Hope you can help me."
doubleUp(txt1)
//res0: Array[Array[String]] = Array(Array(it, is), Array(is, useless), Array(useless, text), Array(Hope, you), Array(you, can), Array(can, help), Array(help, me))

val txt2 = "This is example 2. Just\nskip it."
doubleUp(txt2)
//res1: Array[Array[String]] = Array(Array(This, is), Array(is, example), Array(Just, skip), Array(skip, it))

Answer 2

首先通过删除所有转义字符来处理string。

scala> val string = "Hello, it is useless text. Hope you can help me."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String = Hello, it is useless text. Hope you can help me.

OR

scala>val string = "This is example 2. Just\nskip it."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String =
//This is example 2. Just
//skip it.

然后过滤掉所有必要的字符（如字符，空格等......）并使用slide函数

val result = preprocessed.split("\\s").filter(e => !e.isEmpty && !e.matches("(?<=^|\\s)[A-Za-z]+\\p{Punct}(?=\\s|$)") ).sliding(2).toList

//scala> res9: List[Array[String]] = List(Array(it, is), Array(is, useless), Array(useless, Hope), Array(Hope, you), Array(you, can), Array(can, help))

Answer 3

您需要使用split将字符串分解为由非单词字符分隔的单词，然后sliding以您想要的方式将单词加倍;

val text = "Hello, it is useless text. Hope you can help me."

text.trim.split("\\W+").sliding(2)

您可能还想删除转义字符，如其他答案中所述。

Answer 4

抱歉，我只懂Python。我听说两者差不多了。希望你能理解

string = "it is useless text. Hope you can help me."

split = string.split(' ')  // splits on space (you can use regex for this)

result = []

no = 0

count = len(split)

for x in range(count):
    no +=1

    if no < count:

        pair = split[x] + ' ' + split[no]   // Adds the current to the next

        result.append(pair)

输出将是：

['it is', 'is useless', 'useless text.', 'text. Hope', 'Hope you', 'you can', 'can help', 'help me.']

scala-regexp：将字符串拆分为两个后续单词的数组

4 个答案: