我尝试将文本拆分为几个较小的块,以便使用Javascript和RegEx解析它。我已经在这里展示了我最好的镜头,例如:
https://regex101.com/r/jfzTlr/1
我有一套规则要遵循:我想接收块。每个块都以星号(*)作为第一个符号(如果没有缩进,否则是制表符)开头,后跟2-3个大写字母,逗号,(可能)空格和可能是A,R,T的代码,RS或RSS。其次是可选点。之后的Linebreak,文本来了。该文本结束了下一个星号的出现,遵循与上述相同的模式。
有人可以帮我弄清楚如何分开这个吗?到目前为止,这是我的模式:
[^\t](.{2,3}),\s?.{1,3}\.?\n.*
非常感谢!
答案 0 :(得分:1)
您可以使用
^[ \t]*\*[A-Z]{2,3},\s*(?:[ART]|RSS?)\.?[\n\r](?:(?!^[ \t]*\*[A-Z]{2,3},\s*(?:[ART]|RSS?)\.?)[\s\S])+
<小时/>
分成几部分:
^[ \t]*\*[A-Z]{2,3} # start of the line, spaces or tabs and 2-3 UPPERCASE letters
,\s*(?:[ART]|RSS?)\.?[\n\r] # comma, space (optional), code, dot and newline
(?: # non-capturing group
(?!^[ \t]*\*[A-Z]{2,3},\s*(?:[ART]|RSS?)\.?)
# neg. lookahead with the same pattern as above
[\s\S] # \s + \S = effectively matching every character
)+
该技术被称为驯化贪婪令牌。
答案 1 :(得分:1)
既然您正在使用JavaScript,那么为什么不使用分割来实现分割,它可以分割捕获的字符串以及分离的部分?然后将标题绑定在一个看起来像
的数组中[[heading1, block1], [heading2, block2], ...]
通过这种方式,您可以立即获得格式良好的数据来处理该行。只是一个想法!
const s = `*GW, A
This is my very first line. The asterics defines a new block, followed by the initials (2-3 chars), a comma, a (possible) space and a code that could be A, R, T, RS or RSS. Followed by that is an optional dot. Linebreak afterwards, where the text comes.
*JP, R.
New block here, as the line (kind of) starts with an asterics. Indentations with 4 spaces or a tab means that it is a second level thing only, that does not need to be stripped away necessarily.
But as you can see, a block can be devided into several
lines,
even with multiple lines.
*GML, T.
And so we continue...
Let's just make sure that a line can start with an
*asterics, without breaking the whole thing.
*GW, RS
Yet another block here.
*GW, RSS.
And a very final one.
Spread over several lines.
*TA, RS.
First level all of a sudden again.
*PA, RSX
Just a line to check whether RSX is a separate block.
`;
const splits = s.split(/\*([A-Z]{2,3}),\s?([AT]|RS{0,2})(\.?)\n/).slice(1);
const grouped = [];
for (let i = 0; i < splits.length; i += 4) {
const group = splits.slice(i, i+3);
group[3] = splits[i+3].trim().split(/\s*[\r\n]+\s*/g);
grouped.push(group);
}
console.log(grouped);
&#13;
答案 2 :(得分:-1)
希望这是你想要的。这很有效。
([\*\t])+(.{2,3}),\s?.[A,R,T,RS,RSS]{1,3}\.?\n.*