Question

我有bigg文本文件，我需要提取描述信息：

#### **Description**

20_Ways_To_Make_100_Dollars_EVERYDAY !!!  
High Quality Guide (PDF File)  
Here; I will teach you how to make 100 dollars every, or may be even more!  
Buy the guide to get this secret method. ! worth more than you pay!  
Good luck to everyone!



#### **Ships To**

Worldwide

开始“描述”完成“#### 发送到”，我该如何制作这个惠特蟒蛇？我需要这个输出：

20_Ways_To_Make_100_Dollars_EVERYDAY !!!  
High Quality Guide (PDF File)  
Here; I will teach you how to make 100 dollars every, or may be even more!  
Buy the guide to get this secret method. ! worth more than you pay!  
Good luck to everyone!

Answer 1

假设您在'####'之后的消息中有更多种类，我建议您在解析文件时使用更严格的格式标准：

import re #regular expressions module

file = open('text_to_process.txt', 'r') #opening your file

text = file.readlines()

file.close()

flag = False #flag to mark start/end of description

for line in text:
    if re.match(r"#### \*\*Description\*\*", line):
        flag = True
        continue
    if flag: 
        if not re.match("####", line):
            print(line.strip()) #just printing the line, alternatively you could write it into file or variable
        else:
            flag = False

Answer 2

如果您知道标题的确切外观，请尝试：

In_description = false
Part = ""
For line in file:
    If not in_description:
        In_description = '**Description**' in line
    If in_description:
        In_description = not '**Ships to**' in line
        If in_description:
            Part += line

对于某些大写错误道歉，我在手机上。这段代码的作用是（假设你有一个打开的文件），读取每行看起来将in_description变为true。如果是，请确保它不是最后一行，如果不是，则将该行写入该文件。我不在线，所以如果你需要一个＆＃39; / n＆＃39;我不是百分百肯定的。在行尾（即如果你需要＆＃34;部分+＆＃39; / n＆＃39;＆＃34;），但如果它全部出现在一行中，那么你需要它。我建议将这些常量更改为尽可能具体，包括一些#s。

Answer 3

遍历文件，直至找到Description行，然后
打印行，直到找到Ships To行

with open('data', 'r') as f:
    # iterate through f until Description line found
    for line in f:
        if line.startswith('#### **Description**'):
            break
    # print lines until Ships To line is found
    for line in f:
        if line.startswith('#### **Ships To**'):
            break
        print(line)

break terminates the for-loop。但由于f是iterator，因此下一个for-loop从另一个for-loop停止的地方开始。因此，两个for-loop只在一起传递文件。

python从大文本文件中提取文本描述

3 个答案: