我正在编写一个用于lex分析器(一个小代码)的VB程序,它将识别关键字,标识符和字符串。我正在接受一个字符串,然后将其拆分为单词 这就是我试过的
For Each line As String In txt_source.Text.Split(new String() _
{Environment.NewLine},StringSplitOptions.None)
'Loop through each word in that line
For Each word As String In line.split()
If myKeywordList.Contains(word) then
txt_output.Text &= word & "is a keyword"
ElseIf IS_an_Identifier(word) Then
txt_output.Text &= word & "is an identifier"
ElseIf word.StartsWith("""") And word.EndsWith("""") Then
txt_output.Text &= word & "is a string literal"
End if
Next
Next
这很有效。 split函数根据(空格)
将字符串拆分为单独的元素但我希望split函数忽略String文字。例如 当我输入一个字符串文字,如“现在是时间” 我不希望它被分成子串而是希望它作为一个单词返回。这有可能????
答案 0 :(得分:1)
您应该使用匹配方法而非拆分,以匹配双引号("[^"]*"
)或(|
)字符串中的子字符串非空白字符(\S+
)。
"[^"]*"|\S+
请参阅regex demo。
另请参阅VB.NET demo:
Imports System
Imports System.Text.RegularExpressions
Imports System.Collections
Public Class Test
Public Shared Sub Main()
Dim s As String = "Text ""inside quotes"" here"
Dim results As MatchCollection = Regex.Matches(s, """[^""]*""|\S+")
For Each m As Match In results
Console.WriteLine(m.Value)
Next
End Sub
End Class
LINQ的单行:
Dim results As List(Of String) = Regex.Matches(s, """[^""]*""|\S+").Cast(Of Match)().Select(Function(m) m.Value).ToList()