Question

我想知道如何使用Python或任何其他编程/脚本语言格式化文本文件？

文本文件中的当前格式如下：

ABALONE
Ab`a*lo"ne, n. (Zoöl.)

Defn: A univalve mollusk of the genus Haliotis. The shell is lined
with mother-of-pearl, and used for ornamental purposes; the sea-ear.
Several large species are found on the coast of California, clinging
closely to the rocks.

我希望它像这样（所有在一行上排除某些单词等）：

ABALONE : A univalve mollusk of the genus Haliotis. The shell is lined with

Answer 1

假设格式总是与你描述的完全一样（单词，发音，空行，“Defn：”，定义），这是一个简单的字符串拆分和连接问题：

def reformat(text):
    lines = text.split('\n', 3)
    word = lines[0]
    definition_paragraph = lines[3][len('Defn:'):]
    definition_line = definition_paragraph.replace('\n', ' ')
    return word + ' : ' + definition_line

这个想法是制作一段可以轻松调用以修复文本的代码。在这种情况下，函数被称为reformat，它的工作原理是将给定的文本分成三个第一行和定义，从段落中提取定义，并将单词本身与定义粘合在一起。

另一种解决方案是regular expression，它更适合任务，但由于语法奇怪，可能更难理解：

import re
pattern = re.compile('(.+?)\n.+?\n\nDefn: (.+)', re.DOTALL)
def reformat(text):
    word, definition = pattern.search(text).groups()
    return word + ' : ' + definition.replace('\n', ' ')

这应该与上面的其他代码完全相同，但它更简单，更灵活，可以移植到不同的语言。

要使用上述任何一种，只需调用将文本作为参数传递的方法。

要替换文件中的文本，您需要打开文件，阅读内容，使用上述任何功能重新格式化，然后保存回文件：

with open('word.txt') as open_file:
    text = open_file.read()

with open('word.txt', 'w') as open_file:
    open_file.write(reformat(text))

例如，如果您需要对给定目录中的所有文件执行此操作，请查看os模块中的listdir。

如何使用Python或任何其他编程/脚本语言格式化文本文件？

1 个答案: