我正在尝试使用JavaToken组合子解析器来提取一个位于较大字符串中间的特定匹配(即忽略一组随机字符的随机字符)。然而,我无法让它工作,并认为我被一个贪婪的解析器和/或CRs LFs抓住了。 (前缀字符基本上可以是任何东西)。我有:
class RuleHandler extends JavaTokenParsers {
def allowedPrefixChars = """[a-zA-Z0-9=*+-/<>!\_(){}~\\s]*""".r
def findX: Parser[Double] = allowedPrefixChars ~ "(x=" ~> floatingPointNumber <~ ")" ^^ { case num => num.toDouble}
}
然后在我的测试用例中..
"when looking for the X value" in {
"must find and correctly interpret X" in {
val testString =
"""
|Looking (only)
|for (x=45) within
|this string
""".stripMargin
val answer = ruleHandler.parse(ruleHandler.findX, testString)
System.out.println(" X value is : " + answer.toString)
}
}
我认为它与this SO question类似。任何人都可以看到什么错误吗? TKS。
答案 0 :(得分:2)
首先,你不应该在"\\s"
内两次逃避""" """
:
def allowedPrefixChars = """[a-zA-Z0-9=*+-/<>!\_(){}~\s]*?""".r
在您的情况下,它被单独解释为"\"
或"s"
(s
为符号,而不是\s
)
其次,您的allowedPrefixChars
解析器包含(
,x
,=
,因此它会捕获整个字符串,包括(x=
,后续没有任何内容解析器。
解决方案是关于你想要的前缀更具体:
object ruleHandler extends JavaTokenParsers {
def allowedPrefixChar: Parser[String] = """[a-zA-Z0-9=*+-/<>!\_){}~\s]""".r //no "(" here
def findX: Parser[Double] = rep(allowedPrefixChar | "\\((?!x=)".r ) ~ "(x=" ~> floatingPointNumber <~ ")" ^^ { case num => num.toDouble}
}
ruleHandler.parse(ruleHandler.findX, testString)
res14: ruleHandler.ParseResult[Double] = [3.11] parsed: 45.0
我已经告诉解析器忽略了(
,x=
正在追踪"""\(x=(.*?)\)""".r.findAllMatchIn(testString).map(_.group(1).toDouble).toList
res22: List[Double] = List(45.0)
只是negative lookahead。
替代:
(
如果你想正确使用解析器,我建议你描述整个BNF语法(包括所有可能的)
,=
和(only)
用法) - 而不仅仅是片段。例如,如果关键字"(" ~> valueName <~ "=" ~ value
获取值,请在解析器中加入trait Command
case class Rule(name: String, value: Double) extends Command
case class Directive(name: String) extends Command
class RuleHandler extends JavaTokenParsers { //why `JavaTokenParsers` (not `RegexParsers`) if you don't use tokens from Java Language Specification ?
def string = """[a-zA-Z0-9*+-/<>!\_{}~\s]*""".r //it's still wrong you should use some predefined Java-like literals from **JavaToken**Parsers
def rule = "(" ~> string <~ "=" ~> string <~ ")" ^^ { case name ~ num => Rule(name, num.toDouble} }
def directive = "(" ~> string <~ ")" ^^ { case name => Directive(name) }
def commands: Parser[Command] = repsep(rule | directive, string)
}
。不要忘记scala-parser旨在为您返回AST,而不仅仅是某些匹配值。纯正的regexp更适合非结构化数据的常规匹配。
示例如何以正确的方式使用解析器(没有尝试编译):
public class MyMap {
private Map<String, ArrayList<Pair>> map = new TreeMap<String, ArrayList<Pair>>();
private String[] key;
// create special map
//key = element our array
//value = pair elemene
// string = word which contains in this key
// int = count this word (which contains in this key)
public MyMap(String[] ArrayWord) {
key = ArrayWord;
//init arraylist
for (int i = 0; i < ArrayWord.length; i++) {
map.put(ArrayWord[i], new ArrayList<Pair>());
}
//find word which containing key
/*
example:
String[] mass = {
"f",
"five",
"fivetwo",
"one",
"onefiveone",
"two"
};
map[0] => f->[]
map[1] => five->[(f:1)]
map[2] => fivetwo->[(f:1)(five:1)(two:1)]
map[3] => one->[]
map[4] => onefiveone->[(f:1)(five:1)(one:2)]
map[5] => two->[]*/
for (int i = 0; i < ArrayWord.length; i++) {
for (int j = 0; j < ArrayWord.length; j++) {
if (i != j) {
int count = 0;
if (ArrayWord[i].contains(ArrayWord[j])) {
String str = ArrayWord[i];
// find count word which contains in this key
while (str.contains(ArrayWord[j])) {
str = str.replaceFirst(ArrayWord[j], "-");
count++;
}
Pair help = new Pair(ArrayWord[j], count);
map.get(ArrayWord[i]).add(help);
}
}
}
}
}
public String getCompoundWord() {
String word = "";
//check have we compound word or not
if (isHaveCompoundWords()) {
/*remove Unique Elements of the words are found*/
deleteContainingWord();
/* find index element*/
int index = findIndexCompoundWord();
//return -1 if we have no word which compound just with other words array
try {
word = key[findIndexCompoundWord()];
return word;
} catch (IndexOutOfBoundsException ex) {
System.out.println("Have no word which compound just with other words array");
}
} else {
return "Have no word which compound with other words array, just unique element";
}
return key[findIndexCompoundWord()];
}
private void deleteContainingWord() {
/*
update map
after procedure we would have map like this
String[] mass = {
"f",
"five",
"fivetwo",
"one",
"onefiveone",
"two"
};
map[0] => f->[]
map[1] => five->[(f:1)]
map[2] => fivetwo->[(f:1)(ive:1)(two:1)]
map[3] => one->[]
map[4] => onefiveone->[(f:1)(ive:1)(one:2)]
map[5] => two->[]
*/
for (int i = 0; i < map.size(); i++) {
if (map.get(key[i]).size() > 0) {
ArrayList<Pair> tmp = map.get(key[i]);
for (int j = tmp.size() - 1; j >= 0; j--) {
for (int k = tmp.size() - 1; k >= 0; k--) {
if (k != j) {
// if word contains other word2, remove part word
if (tmp.get(j).getName().contains(tmp.get(k).getName())) {
String s = tmp.get(j).getName().replace(tmp.get(k).getName(), "");
tmp.get(j).setName(s);
}
}
}
}
map.put(key[i], tmp);
}
}
}
private int findIndexCompoundWord() {
int indexMaxCompaneWord = -1;
int maxCompaneWordLenght = 0;
for (int i = 0; i < map.size(); i++) {
if (map.get(key[i]).size() > 0) {
ArrayList<Pair> tmp = map.get(key[i]);
int currentWordLenght = 0;
for (int j = 0; j < tmp.size(); j++) {
if (!tmp.get(j).getName().isEmpty()) {
currentWordLenght += tmp.get(j).getName().length() * tmp.get(j).getCount();
}
}
if (currentWordLenght == key[i].length()) {
if (maxCompaneWordLenght < currentWordLenght) {
maxCompaneWordLenght = currentWordLenght;
indexMaxCompaneWord = i;
}
}
}
}
return indexMaxCompaneWord;
}
private boolean isHaveCompoundWords() {
boolean isHaveCompoundWords = false;
for (int i = 0; i < map.size(); i++) {
if (map.get(key[i]).size() > 0) {
isHaveCompoundWords = true;
break;
}
}
return isHaveCompoundWords;
}
如果您需要处理自然语言(Chomsky type-0),scalanlp或类似的东西更适合。