Question

我正在尝试使用PEG.js

解析类别文件

如何对类别进行分组（非空行集后跟空行）

stopwords:fr:aux,au,de,le,du,la,a,et,avec

synonyms:en:flavoured, flavored

synonyms:en:sorbets, sherbets

en:Artisan products
fr:Produits artisanaux

< en:Artisan products
fr:Gressins artisanaux

en:Baby foods
fr:Aliments pour bébé, aliment pour bébé, alimentation pour bébé, aliment bébé, alimentation bébé, aliments bébé

< en:Baby foods
fr:Céréales pour bébé, céréales bébé

< en:Whisky
fr:Whisky écossais
es:Whiskies escoceses
wikipediacategory:Q8718387

现在我可以用这段代码逐行解析：

start = stopwords* synonyms* category+

language_and_words = l:[^:]+ ":" w:[^\n]+ {return {language: l.join(''), words: w.join('')};}

stopwords = "stopwords:" w:language_and_words "\n"+ {return {stopwords: w};}

synonyms = "synonyms:" w:language_and_words "\n"+ {return {synonyms: w};}

category_line = "< "? w:language_and_words "\n"+ {return w;}

category = c:category_line+ {return c;}

我得到了：

{
    "language": "en",
    "words": "Artisan products"
},
{
    "language": "fr",
    "words": "Produits artisanaux"
}

但我想（对于每个小组）：

{
    {
        "language": "en",
        "words": "Artisan products"
    },
    {
        "language": "fr",
        "words": "Produits artisanaux"
    }
}

我也试过这个，但它没有分组，我在一些行的开头就得到了。

category_line = "< "? w:language_and_words "\n" {return w;}

category = c:category_line+ "\n" {return c;}

Answer 1

我找到了部分解决方案：

start = category+

word = c:[^,\n]+ {return c.join('');}

words = w:word [,]? {return w.trim();}

parent = p:"< "? {return (p !== null);}

line = p:parent w:words+ "\n" {return {parent: p, words: w};}

category = l:line+ "\n"? {return l;}

我可以解析这个......

< fr:a,b
fr:aa,bb

en:d,e,f
fr:dd,ee, ffff

并获得分组：

[
    [ {...}, {...} ],
    [ {...}, {...} ]
]

但是在每个类别的开头都有“lang：”的问题，如果我尝试解析“lang：”我的catégories没有分组......

Answer 2

我发现迭代地分解解析是有用的（问题分解，旧学校的Wirth）。这是一个部分解决方案，我认为可以帮助您找到正确的方向（我没有解析类别的Line元素。

start = 
  stopwords 
  synonyms 
  category+

category "category"
  = category:(Line)+ categorySeparator { return category }

stopwords "stopwords"
  = stopwordLine*

stopwordLine "stopword line"
  = stopwordLine:StopWordMatch EndOfLine* { return stopwordLine }

StopWordMatch 
  = "stopwords:" match:Text { return match }

synonyms "stopwords"
  = synonymLine*

synonymLine "stopword line"
  = synonymLine:SynonymMatch EndOfLine* { return synonymLine }

SynonymMatch 
  = "synonyms:" match:Text { return match }

Line "line"
  = line:Text [\n] { return line }

Text "text"
  = [^\n]+ { return text() }

EndOfLine "(end of line)"
  = '\n'

EndOfFile 
  = !. { return "EOF"; }

categorySeparator "separator"
  = EndOfLine EndOfLine* / EndOfLine? EndOfFile

我对混合外壳的使用是任意的，不是很时尚。还有一种方法可以在线保存解决方案：http://peg.arcanis.fr/2WQ7CZ/

如何使用PEG.js对非空行进行分组

2 个答案: