由于此问题不包含有关正则表达式的特定问题,而是有关设计/方法的更多问题,因此可能需要一段时间才能理解需求及其依赖性。我已经尽一切努力使这个fully working yet not elegant solution尽可能简单。
我需要在正在由其他人创建/编辑的消息传递平台中优化文本,并可能需要使用正则表达式进行清理。所有优化都需要使用一个正则表达式完成,因为这些优化经常发生并且非常昂贵(或者我对此是否错?)。此外,正则表达式必须与语言无关(至少与Javascript和Php兼容)。最后但并非最不重要的一点是,在纯文本环境中使用的优化文本不得包含(附加)HTML。
优化行
优化空间
优化评论
总体
到目前为止,我的解决方案是结合4个正则表达式,它们“符合”我的要求,并且被一个空格替换:
\n(?!\n|[-_.○•♥→›>+%\/*~=] |[a-zA-Z_1-9+][\.|\)|\:|\*])
(长度取决于我要支持的几种列表样式类型)(\n+)(?=\n\n)
+
^\n?\/\/ .+\n
为了使优化的成本不高,我将它们与|
连接到一个可以在Javascript(以及Php)中使用的单个正则表达式。
r = new RegExp(" \n(?!\n|[-_.○•♥→›>+%\/*~=] |[a-zA-Z_1-9+][.):*] )|(\n+)(?=\n\n)| + |^\n?\/\/ .+\n", "gm");
i = document.getElementById("input").innerHTML;
p = " ";
o = i.replace(r, p);
document.getElementById("output").innerHTML = o;
#input, #output { width: 100%; height: 88vh; }
#input { display: none; } #output { border: none; }
<textarea id="input">
MAKE PARAGRAPHS
This is the first paragraph.
Some sentences end with newlines.
Some don't. We need to cope with that.
This is the second paragraph.
It contains some unnecessary spaces.
Even at the end of a line.
This is the third paragraph.
Some sentences end with question- and exclamation-marks.
I hope that is ok for you. Is it? That's great! Really.
KEEP LISTS
This is an unordered list, starting with a minus+space:
- This is the first item.
- This is the second item.
- This is the third item.
Here is an unordered list, starting with entity|symbol+space:
• This is the second item.
> This is the third item. // Works in php only
* This is the fifth item.
This is a (manually) ordered lists, starting with char|digit+entity+space:
1. This is the first item.
b) This is the second item.
3: This is the third item.
Here is a mathematical list, starting with operators:
+ Plus
- Minus
% Percentage
/ Division
* Multiply
~ Like
= Equal
These are (manually) ordered lists, which are not summed up because they do not end with a space:
1 This is the first item.
b This is the second item.
I like the third item.
First: This works.
Second: It works great.
Third: That is nice!
KEEP HTML
The input text may contain <a href="https://example.com" target="_blank">Html</a>.
The output text must simply keep it for further processing.
The output must not add Html as it is processed in a text-only environment.
I know this sounds stupid, but it isn't.
REMOVE COMMENTS
Single/whole line comments are being removed.
// Sources
// Removing single lines: https://regex101.com/r/qU1eP8/5
// Removing comments: https://www.perlmonks.org/?node_id=996552
// Tests
// Dialog: https://api.sefzig.net/dialog/test/regex/
// Jsbin: https://jsbin.com/goromad/edit?output
// Regex101: https://regex101.com/r/Xz5atA/2
// Regexr: https://regexr.com/45svm
Thank you, regex ♥ // Problem solved
~Fin~
</textarea>
<textarea id="output"><!-- Press "Run" --></textarea>
由于我不是正则表达式专家,而且我的方法感到笨拙,因此我想听听您的建议。我知道正则表达式很昂贵,一切都可以做得更好。
为清晰起见,您可能想知道我在这里未提及的一些细节。您可能还想测试我的正则表达式。这就是为什么我设置了一个沙盒,隔离了需求(Regexes),其中包含带有所有用例的示例文本以及详细说明:
如果您想使用出色的工具的功能,请继续:
帮助我弄清楚messaging platform的这一重要功能!请随时增强我的方法,提出替代方案或在您自己的项目中使用结果♥
这是我关于堆栈溢出的第一个问题。我研究了很多。如果我做错了任何事情,请多多包涵。