Question

我想提取文件的某个部分（以txt格式），但是这些部分将重复多次。我想提取所有这些并将它们写入新文件。

例如：这是一个文本文件，它具有一些不同的模式。

aaaaaa

\ begin {theorem} aaaaaaaaaa \ end {theorem}

bbbb

\ begin {theorem}

aaaaaaaaaa

\ end {theorem}

\ begin {theorem} aaaaaaaaaa

\ end {theorem}

我想提取每个\ begin {theorem}和\ end {theorem}之间的所有行，包括\ begin {theorem}和\ end {theorem}，然后将它们写到一个新的文本文件中。这是我尝试过的代码，但是在我的写入文件中什么也不输出。

inFile = open("infile.txt")
outFile = open("outfile.txt", "w")
keepCurrentSet = False
for line in inFile:
    if line.startswith("\end{theorem}"):
        keepCurrentSet = False
    index +=1
    if keepCurrentSet:
        outFile.write(line)

    if line.startswith("\begin{theorem} "):
        keepCurrentSet = True
        index1 +=1
inFile.close()
outFile.close()

Answer 1

您可以通过正则表达式来实现。假设您知道这部分内容，那么我不会放置用于读写文本文件的代码！

##text = Your text file in this variable
import re
p =r'(\bHeadingA\b.*?\bHeadingB\b)' ## Write the name of actual "A" and "B" from which you need to extract text in place of HeadingA and HeadingB
m =re.findall(p, text, re.I|re.M|re.DOTALL)
print(m)  ## Write m in any new text file

Answer 2

您应该使用正则表达式（documentation here）。

将问题分为几部分。

与编程中的每个问题一样，您应该将其分为较小的问题。在您的情况下，我将通过以下方式进行操作：

1。。找到您需要的单词。

2。。获取每个外观之间的范围。

3。。将文本复制到新文件。

让我们按部分解决它：为了查找出现的情况，您可以在正则表达式中使用find()方法（正则表达式的缩写）。这将告诉您单词每次出现的索引。您应该对A和B分别进行此操作，以便知道它们分别在哪里。

注意：一种更高级的方法是构建如下正则表达式：(A)(*)(B))，因为这将匹配A外观和B外观之间的所有内容。也更容易。

对于第二步和第三步，一旦您理解了第一步，它们就很简单了。

祝你好运！

提取文件的特定部分并写入新文件

2 个答案: