C#(我自己的编程语言) - 解析时如何多次查找PRINT STRING

时间:2016-07-12 02:15:49

标签: c# parsing programming-languages lexical-analysis

因此,我目前正在使用Python中的howCode's编程语言编写自己的编程语言,但我只花了一个小时左右的时间尝试将其转换为C#,但它很棒,但是,当我告诉解析器解析我们收集的令牌,它只在找到PRINT STRING或令牌后解析它,然后停止,

这是我的解析器,词法分析器,我的laguage脚本和控制台的代码:

分析器:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace BL
{
    public static class Parser
    {
        public static void Parse(string toks)
        {
            if (toks.Substring(0).Split(':')[0] == "PRINT STRING")
            {
                Console.WriteLine(toks.Substring(toks.IndexOf('\"') + 1).Split('\"')[0]);
            }
        }
    }
}

词法:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace BL
{
    public static class Lexer
    {
        public static string tok = "";
        public static string str;
        public static int state = 0;
        public static string tokens = "";

        public static void Lex(string data)
        {
            foreach (char c in data)
            {
                tok += c;

                if (tok == " ")
                {
                    if (state == 0)
                    {
                        tok = "";
                        tokens += " ";
                    }
                    else if (state == 1)
                    {
                        tok = " ";
                    }
                }
                else if (tok == Environment.NewLine)
                {
                    tok = "";
                }
                else if (tok == "PRINT")
                {
                    tokens += "PRINT";
                    tok = "";
                }
                else if (tok == "\"")
                {
                    if (state == 0)
                    {
                        state = 1;
                    }
                    else if (state == 1)
                    {
                        tokens += "STRING:" + str + "\" ";
                        str = "";
                        state = 0;
                        tok = "";
                    }
                }
                else if (state == 1)
                {
                    str += tok;
                    tok = "";
                }
            }

            Parser.Parse(tokens);
        }
    }
}

我的剧本:

PRINT "HELLO WORLD1" PRINT "HELLO WORLD2"

控制台:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace BL
{
    class Program
    {
        static string data;

        static void Main(string[] args)
        {
            Console.Title = "Compiler";
            string input = Console.ReadLine();
            Open(input);

            Lexer.Lex(data);

            Console.ReadLine();
        }

        public static void Open(string file)
        {
            data = File.ReadAllText(file);
        }
    }
}

当我打印令牌的内容时(在Lexer中)我得到了这个:

PRINT STRING:"HELLO WORLD1" PRINT STRING:"HELLO WORLD2"

虽然当我解析它时,它只打印HELLO WORLD1,而不是HELLO WORLD1,并且在它下面HELLO WORLD2,我不知道应该怎样做以获得另一个PRINT STRING,显然因为这只是一个项目我已创建,网上没有答案,提前谢谢。

1 个答案:

答案 0 :(得分:1)

您尝试解析语言,这很好,但之后您就会生成第二种编程语言。这意味着你的Lex()函数最终需要它自己的解析逻辑来处理结果文本。

这就是为什么大多数时候这个问题都解决了,Lex()函数会创建一个供其他人使用的令牌列表。通常这些令牌不仅仅是字符串,但对于许多小语言来说,可以使用简单的字符串列表作为标记。

由于我对玩具语言情有独钟,因此我修改了您的示例以遵循此流程。它从用户输入加载文件,然后将其分解为单独的令牌,并使用这些令牌来运行'该计划:

// Parse a list of tokens from Lex()
static void Parse(List<string> tokens)
{
    // Run through each token in the list of tokens
    for (int i = 0; i < tokens.Count; i++)
    {
        // And act on the token
        switch (tokens[i])
        {
            case "PRINT":
                // PRINT prints the next token
                // Move to the next token first
                i++;
                // And dump it out
                Console.WriteLine(tokens[i]);
                break;

            default:
                // Anything else is an error, so emit an error
                Console.WriteLine("ERROR: Unknown token " + tokens[i]);
                break;
        }
    }
}

// Parse a source code file, returning a list of tokens
static List<string> Lex(string data)
{
    // The current token we're building up
    string current = "";
    // Are we inside of a quoted string?
    bool inQuote = false;
    // The list of tokens to return
    List<string> tokens = new List<string>();

    foreach (char c in data)
    {
        if (inQuote)
        {
            switch (c)
            {
                case '"':
                    // The string literal has ended, go ahead and note 
                    // we're no longer in quote
                    inQuote = false;
                    break;
                default:
                    // Anything else gets added to the current token
                    current += c;
                    break;
            }
        }
        else
        {
            switch (c)
            {
                case '"':
                    // This is the start of a string literal, note that
                    // we're in it and move on
                    inQuote = true;
                    break;
                case ' ':
                case '\n':
                case '\r':
                case '\t':
                    // Tokens are sperated by whitespace, so any whitespace
                    // causes the current token to be added to the list of tokens
                    if (current.Length > 0)
                    {
                        // Only add tokens
                        tokens.Add(current);
                        current = "";
                    }
                    break;
                default:
                    // Anything else is part of a token, just add it
                    current += c;
                    break;
            }
        }
    }

    return tokens;
}

// Quick demo
static void Main(string[] args)
{
    string input = Console.ReadLine();
    string data = File.ReadAllText(input);

    List<string> tokens = Lex(data);
    Parse(tokens);

    Console.ReadLine();
}