我正在尝试搜索在代字号(~
)符号边框内显示的字词。
e.g. ~albert~ is a ~good~ boy.
我知道使用~.+?~
可以做到这一点,它已经适合我了。但是有些特殊情况需要匹配嵌套的波浪语句子。
e.g. ~The ~spectacle~~ was ~broken~
在上面的示例中,我必须捕捉“景观”,“景象”和“打破”。分别。这些将被逐字翻译或随附文章(An,The,whatever)翻译。原因是在我的系统中:
1) 'The spectacle' requires a separate translation on a specific cases.
2) 'Spectacle' also needs translation on specific cases.
3) IF a tranlsation exist for The spectacle, we will use that, ELSE
we will use
另一个解释这个问题的例子是:
~The ~spectacle~~ was ~borken~, but that was not the same ~spectacle~
that was given to ~me~.
在上面的示例中,我将翻译:
1) 'The spectacle' (because the translation case exists for 'The spectacle', otherwise I would've only translated spectacle on it's own)
2) 'broken'
3) 'spectacle'
4) me
我在组合表达式时遇到问题,该表达式将确保在我的正则表达式中捕获它。到目前为止,我设法使用的那个是'〜。+?〜'。但我知道,通过某种形式的前瞻或外观,我可以让它发挥作用。有人可以帮我吗?
这方面最重要的方面是回归校对,这将确保现有的东西不会中断。如果我设法做到了,我会发布它。
N.B。如果它有帮助,目前我将有只有一级嵌套需要分解的实例。所以~~眼镜~~将是最深层次的(直到我需要更多!!!!!)
答案 0 :(得分:2)
我刚才写过类似的东西,但我还没有测试过它:
(~(?(?=.*?~~.*?~).*?~.*?~.*?~|[^~]+?~))
或
(~(?(?=.*?~[A-Za-z]*?~.*?~).*?~.*?~.*?~|[^~]+?~))
另一种选择
(~(?:.*?~.*?~){0,2}.*?~)
^^ change to max depth
哪个效果最好
要添加更多内容,请在您看到一堆的两个地方添加一些额外的.*?~
。
如果我们允许无限制嵌套我们怎么知道它会在哪里结束并开始?笨拙的图表:
~This text could be nested ~ so could this~ and this~ this ~Also this~
| | |_________| | |
| |_______________________________| |
|____________________________________________________________________|
或:
~This text could be nested ~ so could this~ and this~ this ~Also this~
| | | | |_________|
| |______________| |
|___________________________________________________|
编译器不知道选择哪个
~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.
| | ||_____| | | |
| | |_____________| | |
| |____________________________________________________| |
|___________________________________________________________________|
或:
~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.
| |_________|| |______| |_________| |__|
|_______________|
使用交替字符(如@tbraun建议的那样),以便编译器知道从哪里开始和结束:
{This text can be {properly {nested}} without problems} because {the compiler {can {see {the}}} start and end points} easily. Or use a compiler:
注意:我没有做太多的Java,因此某些代码可能不正确
import java.util.List;
String[] chars = myString.split('');
int depth = 0;
int lastMath = 0;
List<String> results = new ArrayList<String>();
for (int i = 0; i < chars.length; i += 1) {
if (chars[i] === '{') {
depth += 1;
if (depth === 1) {
lastIndex = i;
}
}
if (chars[i] === '}') {
depth -= 1;
if (depth === 0) {
results.add(StringUtils.join(Arrays.copyOfRange(chars, lastIndex, i + 1), ''));
}
if (depth < 0) {
// Balancing problem Handle an error
}
}
}
这使用StringUtils
答案 1 :(得分:-1)
您需要一些东西来区分开始/结束模式。即{}
您可以使用模式\{[^{]*?\}
排除{
:
{The {spectacle}} was {broken}
第一次迭代
{spectacle}
{broken}
第二次迭代
{The spectacle}