如何在连续数量的字母或非字母字符上拆分字符串

时间:2014-05-28 11:58:46

标签: regex vb.net string split

我希望根据特定条件专门拆分字符串。我希望每个单词都返回任何单词(即连续数量的字母字符),以及任何非单词。

为了说明我的意思,让我们说我有字符串"过去20个晚上,约翰已经在晚上11点睡觉了。" (没有引号)。 我喜欢这个split来返回一个strings =

的数组
{
"Each",
" ",
"of",
" ",
"the",
" ",
"past",
" 20 ",
"nights",
", ",
"John",
" ",
"has",
" ",
"gone",
" ",
"to",
" "
"bed",
" ",
"at",
" 11:00 ",
"pm",
"."
}

我对正则表达式不是很熟悉,但我希望这里有解决方案!

1 个答案:

答案 0 :(得分:1)

用正则表达式很容易:

Dim s = "Each of the past 20 nights, John has gone to bed at 11:00 pm."
Dim result = Regex.Split(s, "(\p{L}+)").Skip(1).ToArray()

\p{L}匹配属于"字母"的任何unicode代码点。类别,所以(\p{L}+)表示:匹配任何一个或连续的字母并将它们保留在结果中。 Regex.Split确实会在该模式上拆分字符串。


在没有LINQ的情况下,这是相同的:

Dim s = "Each of the past 20 nights, John has gone to bed at 11:00 pm."
Dim tmp = Regex.Split(s, "(\p{L}+)")
Dim result(tmp.Length - 2) As String
Array.Copy(tmp, 1, result, 0, tmp.Length - 1)