Question

我编写了一组Python 3脚本来获取格式化的文本文件并将数据移动到SQLite数据库中。然后，数据库中的数据将用作PHP应用程序的一部分。我的文本文件中的数据具有粗体和斜体的格式标记，但不包含浏览器可理解的任何内容。格式化方案是这样的：

fi:xxxx        (italics on the word xxxx (turned off at the word break))
fi:{xxx…xxx}   (italics on the word or phrase in the curly brackets {})
fb:xxxx        (bold on the word xxxx (turned off at the word break))
fb:{xxx}       (bold on the word or phrase in the brackets {})
fv:xxxx        (bold on the word xxxx (turned off at the word break))
fv:{xxx…xxx}   (bold on the word or phrase in the brackets {})
fn:{xxx…xxx}   (no formatting)

我想将每行源文本转换为（1.包含字符串的行，使用html标签而不是源格式，以及2.另一行，包含剥去所有格式标记的字符串）。我需要为每个源代码行设置格式化和剥离的行，即使该行上没有使用格式标记。在源数据中，不同（或相同）排序的多个格式标记可能会显示在一行中，但您不会找到任何在该行之前未结束的标记。

Answer 1

要格式化括号中的部分，您可以执行以下操作：

while text.find(":{") > -1:
    index = text.find(":{")
    if text[index-2:index]=="fb":
        text = text[:index-2] + "<b>" + text[index+2:] #insert <b>
        text = text.replace("}","</b>",1) # replace one.
    # else if fi, fv, etc.

这会将“其他fb：{粗文字}文字”转换为“其他粗体文字文字”。

然后你可以转换以空格分隔的部分：

array = text.split(" ")
for word in array:
    if (word.startswith("fi")):
        word = "<i>"+word[2:]+"</i>"
    else if (word.startswith("fb")):
        ....
text = " ".join(array)

如果您想要纯文本，只需替换“＆lt; b＆gt;”等标记即可和“＆lt; / b＆gt;”用空字符串“”。

如果格式化不跨越多行，您可能会获得更好的性能读取和逐行转换：

inFile = open("file.txt","r")
outFile = open("file.out","w")

def convert(text):
    #Change text here.
    return text

for line in inFile:
    outFile.write(convert(line))

使用Python 3中的格式标记从纯文本生成html

1 个答案: