我有bigg文本文件,我需要提取描述信息:
#### **Description**
20_Ways_To_Make_100_Dollars_EVERYDAY !!!
High Quality Guide (PDF File)
Here; I will teach you how to make 100 dollars every, or may be even more!
Buy the guide to get this secret method. ! worth more than you pay!
Good luck to everyone!
#### **Ships To**
Worldwide
开始“描述”完成“#### 发送到”,我该如何制作这个惠特蟒蛇?我需要这个输出:
20_Ways_To_Make_100_Dollars_EVERYDAY !!!
High Quality Guide (PDF File)
Here; I will teach you how to make 100 dollars every, or may be even more!
Buy the guide to get this secret method. ! worth more than you pay!
Good luck to everyone!
答案 0 :(得分:1)
假设您在'####'之后的消息中有更多种类,我建议您在解析文件时使用更严格的格式标准:
import re #regular expressions module
file = open('text_to_process.txt', 'r') #opening your file
text = file.readlines()
file.close()
flag = False #flag to mark start/end of description
for line in text:
if re.match(r"#### \*\*Description\*\*", line):
flag = True
continue
if flag:
if not re.match("####", line):
print(line.strip()) #just printing the line, alternatively you could write it into file or variable
else:
flag = False
答案 1 :(得分:0)
如果您知道标题的确切外观,请尝试:
In_description = false
Part = ""
For line in file:
If not in_description:
In_description = '**Description**' in line
If in_description:
In_description = not '**Ships to**' in line
If in_description:
Part += line
对于某些大写错误道歉,我在手机上。这段代码的作用是(假设你有一个打开的文件),读取每行看起来将in_description变为true。如果是,请确保它不是最后一行,如果不是,则将该行写入该文件。我不在线,所以如果你需要一个' / n'我不是百分百肯定的。在行尾(即如果你需要"部分+' / n'"),但如果它全部出现在一行中,那么你需要它。我建议将这些常量更改为尽可能具体,包括一些#s。
答案 2 :(得分:0)
Description
行,然后Ships To
行with open('data', 'r') as f:
# iterate through f until Description line found
for line in f:
if line.startswith('#### **Description**'):
break
# print lines until Ships To line is found
for line in f:
if line.startswith('#### **Ships To**'):
break
print(line)
break
terminates the for-loop
。但由于f
是iterator,因此下一个for-loop
从另一个for-loop
停止的地方开始。因此,两个for-loop
只在一起传递文件。