我正在尝试使用PEG.js
解析类别文件如何对类别进行分组(非空行集后跟空行)
stopwords:fr:aux,au,de,le,du,la,a,et,avec
synonyms:en:flavoured, flavored
synonyms:en:sorbets, sherbets
en:Artisan products
fr:Produits artisanaux
< en:Artisan products
fr:Gressins artisanaux
en:Baby foods
fr:Aliments pour bébé, aliment pour bébé, alimentation pour bébé, aliment bébé, alimentation bébé, aliments bébé
< en:Baby foods
fr:Céréales pour bébé, céréales bébé
< en:Whisky
fr:Whisky écossais
es:Whiskies escoceses
wikipediacategory:Q8718387
现在我可以用这段代码逐行解析:
start = stopwords* synonyms* category+
language_and_words = l:[^:]+ ":" w:[^\n]+ {return {language: l.join(''), words: w.join('')};}
stopwords = "stopwords:" w:language_and_words "\n"+ {return {stopwords: w};}
synonyms = "synonyms:" w:language_and_words "\n"+ {return {synonyms: w};}
category_line = "< "? w:language_and_words "\n"+ {return w;}
category = c:category_line+ {return c;}
我得到了:
{
"language": "en",
"words": "Artisan products"
},
{
"language": "fr",
"words": "Produits artisanaux"
}
但我想(对于每个小组):
{
{
"language": "en",
"words": "Artisan products"
},
{
"language": "fr",
"words": "Produits artisanaux"
}
}
我也试过这个,但它没有分组,我在一些行的开头就得到了。
category_line = "< "? w:language_and_words "\n" {return w;}
category = c:category_line+ "\n" {return c;}
答案 0 :(得分:0)
我找到了部分解决方案:
start = category+
word = c:[^,\n]+ {return c.join('');}
words = w:word [,]? {return w.trim();}
parent = p:"< "? {return (p !== null);}
line = p:parent w:words+ "\n" {return {parent: p, words: w};}
category = l:line+ "\n"? {return l;}
我可以解析这个......
< fr:a,b
fr:aa,bb
en:d,e,f
fr:dd,ee, ffff
并获得分组:
[
[ {...}, {...} ],
[ {...}, {...} ]
]
但是在每个类别的开头都有“lang:”的问题,如果我尝试解析“lang:”我的catégories没有分组......
答案 1 :(得分:0)
我发现迭代地分解解析是有用的(问题分解,旧学校的Wirth)。这是一个部分解决方案,我认为可以帮助您找到正确的方向(我没有解析类别的Line
元素。
start =
stopwords
synonyms
category+
category "category"
= category:(Line)+ categorySeparator { return category }
stopwords "stopwords"
= stopwordLine*
stopwordLine "stopword line"
= stopwordLine:StopWordMatch EndOfLine* { return stopwordLine }
StopWordMatch
= "stopwords:" match:Text { return match }
synonyms "stopwords"
= synonymLine*
synonymLine "stopword line"
= synonymLine:SynonymMatch EndOfLine* { return synonymLine }
SynonymMatch
= "synonyms:" match:Text { return match }
Line "line"
= line:Text [\n] { return line }
Text "text"
= [^\n]+ { return text() }
EndOfLine "(end of line)"
= '\n'
EndOfFile
= !. { return "EOF"; }
categorySeparator "separator"
= EndOfLine EndOfLine* / EndOfLine? EndOfFile
我对混合外壳的使用是任意的,不是很时尚。 还有一种方法可以在线保存解决方案:http://peg.arcanis.fr/2WQ7CZ/