Question

我正在使用CSharp Java目标 - 我正在解析一些像这样的Csharp代码：

List<Token> codeTokens = new ArrayList<Token>();
List<Token> commentTokens = new ArrayList<Token>();
//CharStream cs = CharStreams.fromString(contents);
CharStream cs = CharStreams.fromPath(path);
CSharpLexer lexer = new CSharpLexer(cs);
// recognition error happens here: 
List<? extends Token> tokens = lexer.getAllTokens();
List<Token> directiveTokens = new ArrayList<Token>();
ListTokenSource directiveTokenSource = new ListTokenSource(directiveTokens);
CommonTokenStream directiveTokenStream = new CommonTokenStream(directiveTokenSource, CSharpLexer.DIRECTIVE);
CSharpPreprocessorParser preprocessorParser = new CSharpPreprocessorParser(directiveTokenStream);

如果我的源代码是ASCII编码的，它可以正常工作。但如果它是UNICODE，即使文件中没有任何内容，我总是会收到此错误：

line 1:0 token recognition error at: ''

我需要以不同方式配置我的Lexer吗？该错误来自Lexer.java =＆gt; getAllTokens（）=＆gt; nextToken（）=＆gt; getInterpreter（）。match（_input，_mode）;

同样，即使使用空的UNICODE编码文件，我也能得到它 - 但它仍然包含U + FEFF字符：

$ less ApiUserInfo.cs
<U+FEFF>
ApiUserInfo.cs (END)

谢谢天使

ANTLR4 CSharp Lexer使用unicode源代码文件生成错误

0 个答案: