Question

我有一个函数，可以将输入字符串格式化为html字符串

while (reader.Read()) // This returns true if a record is available, and false once all records have been read.
{
    var paragonValue = reader.GetInt64(0); // This reads the current record's Paragon value.
    // Do something with paragonValue.
}

哪里

For example, lorem ipsum *dolor sit amet* consectetur **adipiscing** elit.

因此，输出字符串可能是：

* is for <i>
** is for <b>

我已经编写了此功能：

Lorem ipsum <i>dolor sit amet</i>, consectetur <b>adipiscing</b> elit.

但是在输出中我得到了：

val input = "Lorem ipsum *dolor sit amet*, consectetur **adipiscing** elit."                                      

val tagMap = mapOf(                                                                                               
        '*' to "<i>",                                                                                             
        '♥' to "<b>",                                                                                             
        '♦' to "<s>"                                                                                              
)                                                                                                                 

val tagMapClose = mapOf(                                                                                          
        '*' to "</i>",                                                                                            
        '♥' to "</b>",                                                                                            
        '♦' to "</s>"                                                                                             
)                                                                                                                 

fun tagCheck(obj: String): String {                                                                               
    var str = Regex("""\*\*""").replace(obj, "♥")                                                                 
    str = Regex("""~~""").replace(str, "♦")                                                                       
    str = Regex("""\*\*\*""").replace(str, "♥*")                                                                  
    val charList = str.toList()                                                                                   
    var res = ""                                                                                                  
    val indexMap = mutableMapOf<Int, String>()                                                                    
    var ct = 0                                                                                                    

    for ((tag, define) in tagMap) {                                                                               
        val tagIndex = mutableListOf<Int>()                                                                       
        var status = true                                                                                         
        for (char in charList) if (char == tag) tagIndex.add(charList.indexOf(char))                              
        ct = if (tagIndex.size % 2 == 1) tagIndex.size                                                            
        else tagIndex.size + 1                                                                                    

         for (i in 0 until ct - 1) {                                                                               
           if (status) {                                                                                         
                indexMap[tagIndex[i]] = tagMap.getValue(tag)                                                      
                status = false                                                                                    
            }                                                                                                     
            else if (!status) {                                                                                   
                indexMap[tagIndex[i]] = tagMapClose.getValue(tag)                                                 
                status = true                                                                                     
            }                                                                                                     
        }                                                                                                         
                                                                        }                                                                                                             
for (item in charList) {                                                                                      
        res += if (indexMap.keys.contains(charList.indexOf(item))) indexMap[charList.indexOf(item)]               
        else item                                                                                                 
    }                                                                                                             
    return res                                                                                                    
    }

因此，函数无法检查打开或关闭的标签，它只是只写关闭的标签，我该怎么办？

Answer 1

我强烈建议您使用Markdown解析器。这些可能更准确，并且比正则表达式受边缘情况的影响要小。

话虽如此，您可以使用正则表达式进行解析。但是，由于处理令牌的方式，您的方法失败了。您可以将降价标记分配给不同的统一标记（仅这是不必要的转换），然后将相同的标记分配给两个不同的结束标记。那和循环可能使所有标签最终以结束标签结束。

存储令牌不是一个坏主意，因此让我们从此开始。在**和b以及*和i之间创建连接。无需从***到bi-解析结束时，解析将逐步将它们转换，并将***content***首先转换为*content*，然后将其转换为{{ 1}}。在这种情况下，我通过正则表达式进行映射，以使以后更加轻松：

<b><i>content</b></i>

正则表达式本身更复杂：

不应该匹配空的，应该忽略空格，并且需要针对两种不同的类型进行匹配。

无论如何，我们有一个核心的正则表达式：

val tokens = mapOf(
    "\\*\\*" to "b", "\\*" to "i", "~~" to "s"
)

请注意，仅此将无法单独运行。第一组和最后一组是为了避免将val core = "(?<!\* )(?=\S)(.+?[*]*)(?<=\S)(?<! \*)"解析为有效的，因为*** something***也是内部的有效字符。

在这种情况下，我在示例中定义了*，但是您当然可以用其他东西代替它。只是为了便携。

var string: String

为了演示，我还打印了var string = "***this*** **is** a ***test of* a markdown *regex* parsing system** that I really **hope *works* like it's supposed to**. " + "And an ** invalid one **, and not to forget ~~the broken regex~~ the perfect regex" for ((token, html) in tokens) { // if you don't know, Map entries can be unpacked like tuples. The token is for an instance **, and html is b val modRegex = "$token$core$token".toRegex() // Construct the regex. This is why the format in the map is regex-style string = string.replace(modRegex, "<$html>$1</$html>")//And finally, we replace it. The String templates ensure the right match, `$1` is the first capture group to avoid losing content }：

string
此是标记降级正则表达式的 测试系统，我真的希望能正常工作，还有一个**无效的**，并且不要忘记~~坏掉的正则表达式~~完美的正则表达式

现在，正则表达式远非完美，尤其是在降价方面。首先，有几种边缘情况和不正确的处理来自不正确的降价。您不能随便放置随机标记并将其解释为有效的减价。因此，放置不正确的令牌可能会导致处理和解析不正确，这也是为什么我强烈建议使用markdown解析器而不是regex的原因。

虽然可以扩展到其他令牌，但不适用于链接。 this is a test of a markdown regex parsing system that I really hope works like it's supposed to. And an ** invalid one **, and not to forget <s>the broken regex</s> the perfect regex需要围绕HTML移动组才能工作，并且有两个相关的组。 ()[]-> ($1)[$2]，这再次忽略了网址上的替代文本。

无论如何，即使答案远非完美，此答案中的代码仍应在正则表达式分析系统上为您提供帮助。

功能答案不正确

1 个答案: